Interacting With Dynamic Real Objects in a Virtual Environment

Introduction

Driving Issues

Conducting design evaluations and assembly feasibility assessments in immersive virtual environments (VEs) allows designers to assess and validate various design alternatives more efficiently and cost-effectively than traditional mock-ups, while providing deeper insights than static drawings Design review has emerged as a key application of VEs, enabling the exploration of critical design questions through the use of virtual models.

• Can an artifact readily be assembled?

• Can repairers readily service it?

The ideal virtual experience (VE) system immerses participants in a fully realistic environment, making them believe they are genuinely performing a task Every aspect of the task is meticulously replicated, including visual elements, accurate sounds, and identical odors When participants reach out to touch objects, they experience realistic sensations, with parts and tools exhibiting weight and appropriate handling This level of interaction ensures that users engage with virtual objects as they would in reality, while those objects respond accurately to their actions Such an optimal training and simulation environment mirrors the fictional Holodeck from Star Trek, where users are completely absorbed in a computer-generated world, experiencing consequences as if they were real.

Current virtual environments (VEs) significantly deviate from the ideal system, as they primarily consist of virtual objects that lack physical interaction While this approach has advantages, such as the ability to simulate scenarios like bullet trajectories, it poses challenges for hands-on tasks like assembly and servicing The absence of tactile feedback and manual affordances in virtual models undermines their effectiveness, making them less practical for tasks that require realistic interactions For instance, using a six degree-of-freedom (DOF) wand to mimic a wrench fails to provide an authentic experience, which is particularly evident when attempting to perform simple tasks like unscrewing an oil filter from a car engine in a virtual environment.

Interacting with virtual objects in virtual environments (VEs) is constrained by two main factors Firstly, the challenge of accurately modeling and tracking users and real objects limits the ability of virtual objects to respond effectively Secondly, due to the limited information available regarding the shape, appearance, and movement of users and real objects, the visual representation of these elements in the VE tends to be stylized rather than an accurate depiction.

In a virtual environment, users are embodied as avatars, which are often depicted as stylized virtual human models Commercial packages like EDS's Jack provide these representations, enhancing the immersive experience for users.

While models like [Ward01] or Curious Lab’s Poser 4 [Simone99] offer significant detail, they often fail to accurately represent individual participants' appearances Research suggests that this lack of visual alignment can negatively impact the effectiveness of virtual environments (VEs), diminishing participants' belief in the virtual world and their sense of presence.

We broaden the concept of an avatar to encompass a virtual representation of any tangible object These real-object avatars are linked to their physical counterparts and are designed to closely mimic their shape, appearance, and movement.

To effectively model, track, and interact with real objects like users' hands and specialized tools, specific development is essential In our user study's purely virtual environment, we enabled users to pick up and manipulate virtual blocks, necessitating the creation of code for tracked pinch gloves and interaction mechanisms among virtual objects This involved defining and implementing every possible input, action, and model for both real and virtual entities, resulting in a system that dictated specific user interactions with the blocks Additionally, any modifications to the virtual environment required significant updates to the code or database.

Designers often limit the use of real objects in virtual environments (VEs) due to the significant development effort required and the challenges associated with object tracking and modeling Additionally, certain types of real objects, particularly those that are highly deformable like bushy plants, pose considerable difficulties in terms of accurate modeling and tracking within a VE.

Training with virtual objects may negatively impact performance in tasks that rely on haptic feedback and natural affordances For instance, using real tools instead of virtual simulations can significantly enhance the effectiveness of training with complex tools.

Incorporating dynamic real objects into virtual environments can significantly enhance interactivity and create visually accurate representations Dynamic objects, such as a socket wrench set, clothing, and the human hand, can deform, change topology, and alter their appearance In assembly verification tasks, both the user and the tools involved are often dynamic, affecting their shape, motion, and appearance Integrating these dynamic real objects can improve task performance and user presence in various virtual environments Additionally, engaging with real objects offers better affordance matching and tactile feedback, further enriching the user experience.

Incorporating real objects into virtual environments (VEs) involves visualizing these objects and enabling virtual counterparts to respond to them Key challenges include effectively representing real objects within the VE and managing the interactions between the real and virtual elements.

Designers can enhance their workflow by integrating real objects with virtual models, allowing them to assess spatial accessibility and train individuals in assembling models at various stages This approach utilizes actual components and tools, accommodating the diverse skills of participants.

Today, neither standard tracking technologies nor modeling techniques are up to doing this in real time at interactive rates

Incorporating dynamic real objects into virtual environments (VEs) involves capturing their shape and appearance for seamless integration Our system generates approximate virtual models of these dynamic objects in real time by utilizing multiple outside-looking-in cameras to calculate shape information Additionally, a camera aligned with the user's perspective captures the object's appearance, ensuring an immersive experience.

Video capture of real objects offers a significant advantage in enhancing visual realism By accurately displaying a user's arm with realistic lighting, pigmentation, and clothing when it enters the field of view, the system can create a visually faithful avatar in real time This capability allows for a more immersive and authentic user experience.

Thesis Statement

We started off to prove the following:

Naturally interacting with real objects in immersive virtual environments improves task performance and sense-of-presence in cognitive tasks.

Our study results showed a significant task performance improvement, but did not show a significant difference in sense-of-presence.

Overall approach

We have created a hybrid environment system that utilizes image-based object reconstruction algorithms to generate real-time virtual avatars of real objects Participants can visually see themselves and any real objects integrated into the virtual environment Additionally, they can physically interact with real objects while engaging with virtual elements Our innovative image-based algorithm eliminates the need for prior modeling and effectively manages dynamic objects, making it essential for assembly-design tasks.

Our innovative system leverages commodity graphics-card hardware to accelerate the computation of a virtual approximation, known as the visual hull, of real-world objects While current graphics hardware has a limited range of operations compared to general CPUs, it excels in executing these operations at remarkable speeds; for instance, the nVidia GeForce4 can render over 75 million 3-D triangles and display more than 1.2 billion pixels per second By utilizing these computations and common graphics memory buffers, such as the frame buffer and stencil buffer, we generate real-time virtual representations of scene objects from various perspectives Our approach effectively breaks down the 3-D visual hull problem into manageable 2-D tasks that can be efficiently solved using the specialized computational power of graphics hardware.

To create a virtual representation of a real object, we begin by capturing its shape and appearance, followed by rendering it within a virtual environment (VE) This virtual representation can interact with and influence other virtual objects Each object is modeled as a visual hull derived from various camera angles, with texture mapping applied to reflect the real object's lit appearance The final avatars are seamlessly integrated with virtual objects, ensuring accurate obscuration.

Real-object avatars are textured using images from a camera mounted on a head-mounted display (HMD), providing users with a highly accurate virtual representation of themselves This technology ensures that the avatars reflect precise appearances, joint movements, and shape deformations in real-time, enhancing the immersive experience for participants.

We have created algorithms that enable the use of virtual representations of real objects in simulations involving virtual lighting and physics-based mechanics These advancements feature innovative collision-detection and collision-response algorithms that leverage graphics hardware for real-time computations Additionally, these real-object avatars can interact with and influence simulations related to visibility and illumination.

Real-object avatars can interact with virtual environments through features such as virtual lighting, shadows, and collision detection These interactions enable avatars to influence simulations, including particle systems, cloth dynamics, and rigid-body physics, enhancing the overall immersive experience.

In our oil filter example, we can identify whether the actual oil filter the user holds intersects with the virtual engine model This interaction allows the user's hand to cast a shadow onto the virtual engine and enables them to brush aside a virtual wire while reaching for a specific area Essentially, we are blending physical space with real objects and virtual space with corresponding digital elements.

In our investigation of user studies involving real objects in virtual environments (VEs), we aimed to assess how haptic feedback and the visual fidelity of avatars influence both task performance and the sense of presence Our focus was primarily on cognitive tasks, examining the interplay between these factors and their impact on user experience.

• Will task performance significantly improve if participants interact with real objects instead of purely virtual objects?

• Will sense-of-presence significantly improve when participants are represented by visually faithful self-avatars?

Cognitive tasks, which involve problem-solving and decision-making, are distinct from perceptual motor tasks; for instance, choosing to pick up a red pen instead of just any pen Most design verification and training tasks fall under the cognitive category Research indicates that using immersive virtual environments (VEs) enhances efficiency in assembly planning and design compared to traditional blueprints or 3-D models displayed on monitors.

In our user study, we examined two hypotheses by evaluating a block arrangement task across three different systems: a purely virtual task system and two hybrid systems with varying levels of visual fidelity, using a real-space task system as a baseline We measured task performance by comparing the time participants took to complete the task in the virtual environment (VE) against their completion time in real space Our goal was to determine the extent to which interaction with physical objects improved overall performance.

The results show a statistically significant improvement in task performance measures for interacting with real objects within a VE compared to interacting with virtual objects (Figure 1).

In a study examining task performance in virtual environments (VEs) under various interaction conditions, the Real Space served as the baseline Participants in the purely virtual condition engaged with virtual objects, while those in both the Hybrid and Visually Faithful Hybrid conditions manipulated real objects.

For presence comparison, we used the following explicit definition of presence from Slater and Usoh [Slater93]:

Human participants in a virtual environment can be deeply convinced that they are in a different location than their physical surroundings This immersive experience is driven by the combination of visual imagery, sound, and tactile sensations generated by the computer-synthesized environment, effectively engaging their senses and altering their perception of reality.

We administered a presence questionnaire and interviewed participants after they completed the experience.

In our study, we examined the differences in responses between virtual environments (VEs) featuring generic avatars and those with personalized avatars The findings indicated a slight, though not statistically significant, increase in participants' self-reported sense of presence within the hybrid environment compared to the purely virtual environment, as illustrated in Figure 2.

Figure 2 – Mean Sense-of-presence Scores for the different VE conditions VFHE had visually faithful avatars, while HE and PVE had generic avatars.

Steed-Usoh-Slater Sense of Presence Questionnaire Scores

VE Condition Mean Sense of Presence Score

Application to an Assembly Verification Task We wanted to apply this system to a real world problem

We initiated a collaboration with the payload-engineering team at NASA Langley Research Center in Hampton, Virginia In an initial exploratory study, four experts in payload design and engineering utilized the reconstruction system to assess an abstracted version of a payload assembly task.

Participants were tasked with connecting a physical tube and power cable to actual connectors while assessing the virtual hardware layout These connectors were linked to a virtual payload model, and collision detection was performed between the user, real tools, and physical components against the virtual payload objects.

Innovations

This dissertation explores the integration of dynamic real objects into virtual environments, focusing on the development of algorithms that create interactive virtual representations of these objects By utilizing advanced graphics hardware, the study introduces a novel volume-querying technique to efficiently reconstruct the visual hull of real objects, enhancing their usefulness and application in virtual settings.

We have created innovative hardware-accelerated algorithms for collision detection and response, enabling seamless interactions between real and virtual objects This groundbreaking system is the first of its kind to integrate arbitrary dynamic real objects into a virtual environment.

We wanted to see if these methods for incorporating real objects were advantageous for cognitive tasks

Our research investigated how different interaction modalities and avatar fidelity impact task performance and the sense of presence The findings revealed that engaging with real objects notably enhances performance in spatial cognitive tasks However, there was no significant variation in the sense of presence based on the visual fidelity of avatars.

We have started implementing our system for an assembly verification task at NASA Langley Research Center (LaRC) Early trials with payload designers indicate that our reconstruction systems are effective in supporting payload development.

Hybrid virtual environments (VEs) are anticipated to enhance the range of tasks and applications that can leverage immersive experiences By integrating real objects, these environments offer a higher fidelity of natural interaction, making them more effective and engaging for users.

Previous Work

Incorporating Real Objects into VEs

Our objective is to create a virtual environment (VE) filled with digital representations of real-world objects We specifically address the challenge of transforming physical objects into their virtual counterparts After successfully generating these virtual representations, we aim to integrate them seamlessly into the VE.

Incorporation of real objects into virtual environments (VE) involves subsystems like lighting, rendering, and physics simulations that can recognize and respond to physical objects This process consists of two key elements: capturing information about real objects and enabling virtual systems to interact with this data We examine existing algorithms for capturing object information and explore techniques for integrating this data into virtual systems effectively.

Applications that incorporate real objects seek to capture the shape, surface appearance, and motion of the real objects Object material properties and articulation may also be of interest

Incorporating real objects into virtual environments depends on specific application needs, such as whether real-time processing is necessary and if the objects are dynamic, meaning they can move, change shape, or alter their appearance and properties Additionally, it is essential to determine the required accuracy of these representations and how they will interact with the rest of the virtual environment.

Prebuilt catalog models often lack availability for specific real-world objects, making the process of measurement and modeling tedious, especially for complex static items Capturing the degrees of freedom and articulation in dynamic objects presents even greater challenges We present three example applications that highlight the need for accurate information capture of intricate real objects.

Creating virtual representations of real-world scenes has significant applications in filmmaking, video games, and 3D documentation, such as capturing archaeological sites, sculptures, and crime scenes These models can be utilized in virtual environments for educational purposes, visualization, and exploration Traditional methods, including tape measures, cameras, and CAD systems, are often inefficient and time-consuming Consequently, automated techniques for accurately capturing the shape and appearance of primarily static scenes have become essential, marking a pivotal advancement in the field of automated capture.

Innovative techniques for viewing recorded or live events from unique perspectives are revolutionizing entertainment and analysis applications These advancements allow experts to dissect golf swings and enable sportscasters to deliver dynamic views of plays to television audiences The Eye Vision system developed by Kanade at CMU has been a significant milestone in this field.

Superbowl XXXV and generated novel views of the action from image data generated from a ring of cameras mounted in the stadium [Baba00] This allowed commentators to replay an event from different perspectives, letting the audience see the action from the quarterback or wide receiver’s perspective This required capturing within a short amount of time the shape, motion, and appearance information for a large scene populated with many objects

Tele-immersion applications enhance traditional videoconferencing by introducing 3-D perception, which researchers believe will improve interpersonal communication through accurate 3-D cues The Office of the Future project, as detailed by Raskar, creates 3-D models of participants using images from multiple cameras, allowing for the transmission of a virtual representation of each person to remote locations This technology captures the dynamic nature of real objects, ensuring that crucial shape and appearance details are preserved during communication.

This article explores various methods for creating virtual representations of real objects, focusing on the modeling and tracking of these objects We analyze current techniques that enhance the integration of virtual representations with real-world entities, ensuring accurate and effective application across different platforms.

Creating virtual models of real objects is a specialized area within object and scene reconstruction, with various commercial software packages available for this purpose Object reconstruction emphasizes gathering data on specific foreground objects, while scene reconstruction aims to capture comprehensive data for an entire environment.

Applications typically necessitate distinct criteria for the virtual representation of real objects, with various algorithms tailored to address specific problem types The key features of model-generation techniques for real objects include their adaptability to different challenges and their ability to produce accurate representations.

Accuracy refers to the degree to which a virtual representation aligns with the actual object it depicts In fields like surgical planning, there are stringent standards for accuracy, necessitating that the virtual model closely mirrors the real object to ensure effective outcomes.

When evaluating the accuracy of a virtual representation, it is essential to determine whether it is conservative—meaning the virtual volume completely encompasses the real object—or speculative, indicating that some points of the real object lie outside the virtual volume Additionally, understanding the systematic and random errors present in the system is crucial for assessing the reliability of the virtual representation.

When evaluating a model generation approach, it's essential to determine whether it is designed for real-time applications or specifically optimized for static objects For real-time models, key factors to consider include the sampling rates and latency involved in the process.

Capturing object information can be done actively by instrumenting real objects with trackers or passive methods that avoid physical interaction For fragile items like historical artifacts, physical contact may cause irreversible damage Therefore, utilizing camera or laser-based techniques is a more effective approach for gathering data on delicate objects, as demonstrated by Levoy's capture of Michelangelo's David.

Avatars in VEs

Overview An avatar is an embodiment of an ideal or belief It is derived from a Sanskrit phrase meaning

In virtual environments (VEs), avatars serve as the participant's self-representation, embodying the concept of a god taking human form This review emphasizes the algorithms used to generate and control self-avatars, as well as the impact of these representations on the immersive experience within VEs While avatars can represent any real object, this discussion focuses specifically on the visual representation of the participant.

Current virtual environment (VE) systems typically offer participants a selection of avatars from a library, a generic avatar that is the same for everyone, or no avatar at all Our research indicates that the most prevalent method in VE design is the use of a generic avatar, which adheres to a one-size-fits-all approach.

Research indicates that using generic avatars significantly enhances the sense of presence compared to having no avatar at all Our observations from the Walking Experiment demo suggest that a realistic avatar may further enhance this sense of presence beyond what generic avatars can provide.

To create precise avatars, it is essential to accurately capture the participant's movements while rendering their form and appearance Additionally, the avatar should serve as the main interface for user interaction within the virtual environment (VE).

The human body exhibits a wide range of movement and significant variations in shape and appearance among individuals Usoh emphasizes that tracking all limbs and customizing avatar appearances can lead to substantial improvements in presence within virtual environments Typically, existing virtual environment systems utilize additional trackers on participants to monitor positional changes, enabling the use of an articulated stock avatar model.

Additional trackers and devices come with their own limitations, which can affect the effectiveness of a virtual environment (VE) The impact of these restrictions varies based on the specific application, making it crucial for designers to take them into account during the development process.

Capturing an accurate visual representation of a participant's shape and pose is challenging due to the inherent deformability of the human body For instance, the significant alterations in the shape of the hand and arm during actions like grasping and opening a twist-lid jar illustrate this complexity Traditional rigid-body models fall short in flexibility, making it difficult to represent these nuanced shape changes, while creating and managing more elastic models poses its own set of challenges.

The appearance of avatars is a crucial aspect of their design, as it is important to align their virtual look with physical reality Although achieving this dynamically can be challenging, innovative commercial systems like AvatarMe™ are emerging to create personalized avatars In this system, users enter a booth where four images are captured, and key landmarks such as the top of the head and armpits are automatically identified These reference points allow for the deformation of a stock avatar model, onto which the images are then mapped The resulting personalized avatars can be utilized across various virtual environments, including interactive games and multi-user online spaces.

We have seen how important having an avatar is, but we will examine a popular VE to help identify common issues in providing good, articulated avatars.

The Walking > Virtual Walking > Flying in Virtual Environments project, led by Usoh et al., features the Walking Experiment, which employs additional limb trackers to manipulate a standardized avatar model This avatar, designed to be gender and race neutral, is depicted in gray attire, wearing a blue shirt, blue pants, and white tennis shoes Participants have shared their observations regarding this uniform avatar design.

• “Those are not my shoes.”

• (From an African-American teenager) “Hey, I’m not white!”

These comments sparked our investigation to see whether representing participants with a visually faithful avatar would improve the effectiveness of the VE experience.

The Walking Experiment VE has been demonstrated over two thousand times, but a version featuring an articulated tracked avatar, which includes additional tracking for a hand or both hands and feet, has only been showcased a few times This limited exposure can be attributed to various factors.

• The time required to attach and calibrate the trackers for each person decreased the number of people who could experience the VE.

• The increase in system complexity required more software and hardware for both running and maintaining the VE.

• The increase in encumbrance with the wires and tethers for the trackers made the system more prone to equipment failure

• The increase in fragility of using more equipment made us weigh the advantages of an increase in realism versus an increased risk of damage to research equipment.

So even with a system capable of providing tracked avatars, the additional hardware might make it infeasible or undesirable to present the more elaborate experience for everyone.

Avatar Research Current research is trying to understand the effects of avatars on the experience in a VE

• Given that we wish the avatar to represent certain properties, what parts of avatars are necessary?

Since creating, modeling, and tracking a complex avatar model is extremely challenging, it is important to determine how much effort and in what directions developers should focus their resources

Avatars play a crucial role in virtual environments (VEs) by conveying essential information to participants, with researchers exploring the key components that enhance presence, communication, and interaction Non-verbal cues like gestures, gaze direction, and posture contribute significantly, accounting for up to 60% of the information exchanged in face-to-face communication It is vital to consider which properties best represent the avatar Thalmann discusses the current landscape and research challenges related to various avatar components, including rendering, interaction, and tracking.

Recent research indicates that even basic avatar representations can effectively communicate significant information In a study conducted by Mortensen et al., participants collaborated to navigate a maze while carrying a stretcher, using low-quality visual avatars that primarily displayed their position, orientation, hand cursor, and speech The findings revealed that participants successfully navigated challenging areas and were able to gauge each other's mood, highlighting the effectiveness of even minimal avatar detail in facilitating interaction and collaboration.

Slater et al have explored the social implications of avatars in virtual environments (VEs), focusing on how participants engage with these avatars and the parallels to real human interactions Their early research compared group dynamics in three settings: fully immersive VEs, desktop environments, and real-world scenarios In both immersive and desktop conditions, participants interacted through simplistic avatars, which elicited emotions like embarrassment, irritation, and self-awareness during virtual meetings The findings highlight that having a representation of participants is crucial for enhancing social interaction, improving task performance, and fostering a sense of presence in virtual settings.

Garau's study examined participant interactions through various communication methods, including audio only, avatars with random gaze, avatars with inferred gaze based on tracked eye motion, and high-quality audio/video The findings revealed a notable difference in participant responses, with the inferred-gaze condition consistently outperforming the random-gaze condition in subjective evaluations.

Researchers are investigating the use of avatars to address public speaking phobias and enhance user interaction in distributed tasks Their findings highlight that even basic self-avatars can significantly impact the sense of presence and interactivity within virtual environments.

Interactions in VEs

Interacting with a virtual environment requires users to provide inputs or adjust variables within a world model simulation These inputs can be categorized as active, such as scaling objects or navigating menus, or passive, like casting shadows or enabling an avatar's hand to collide with a virtual ball.

Active interaction in virtual environments (VE) is achieved by converting physical actions, like button presses or hand gestures, into virtual actions such as grasping For instance, to select an object, a user usually positions their avatar's hand or selection icon to touch the object and then either presses a trigger or performs a grasping or pinching motion.

Passive inputs enhance simulation systems by integrating real-object avatars as supplementary data objects This integration supports various functions, including rigid-body simulations, lighting and shadow rendering, and collision detection algorithms As a result, these passive interactions enable the environment to respond predictably, allowing participants to engage with it in familiar ways.

Human-computer interaction researchers, including Doug Bowman and Hand, have explored the taxonomies of active inputs in virtual environments (VEs), breaking down actions into fundamental components like selection and translation Certain tasks, such as deleting or scaling an object, are uniquely active in nature and lack direct real-world counterparts.

In an ideal virtual environment (VE), participants should engage using natural speech and body movements, as human limbs are articulated and their surfaces can deform The VE system must effectively interpret and respond to expressions, gestures, and motions However, capturing this complex information for both image rendering and simulation input presents a significant challenge, known as the tracking problem, which remains one of the least advanced aspects of VE technology.

The core challenge in virtual environments (VEs) lies in the lack of real objects, which can limit their effectiveness Conversely, relying solely on real objects negates the benefits of VEs, such as rapid prototyping and safe training for high-risk tasks The ideal mix of real and virtual elements varies by application, with flight simulators exemplifying this balance These advanced simulators utilize a real cockpit combined with a motion platform for realistic sensations, while the external visuals are rendered virtually This powerful integration is widely adopted for pilot training due to its effectiveness.

The shift to a fully virtual environment eliminates crucial sensory cues, such as motion constraints, tactile feedback, and force responses, which are essential for task performance Often, these cues are either simulated or completely absent, potentially diminishing the effectiveness of virtual experiences depending on the specific task at hand.

Previous research has explored the impact of interacting with real objects on virtual environment (VE) graphical user interfaces (GUIs) A study by Lindeman et al compared the effectiveness of 2-D and 3-D GUI widgets alongside various physical interaction surfaces, focusing on two tasks: a slider task and a drag-and-drop task The virtual GUI was tested with different surface types, including a tracked real surface, a virtual surface, and a clamped virtual surface that restricted avatar movement Results indicated mixed performance outcomes between the 2-D and 3-D widgets; however, the physical surface outperformed the clamped virtual surface, which, in turn, was more effective than the purely virtual surface.

Current interaction methods in virtual environments (VEs) utilize specialized devices to capture participant inputs and control functions Among the most common commercial devices are tracked articulated gloves, such as Immersion's Cyberglove, which feature gesture recognition and buttons for enhanced user interaction These technologies enable a more immersive and intuitive experience in virtual settings.

D/interaction/cyberglove.shtml]), tracked mouse (Ascension Technology’s 6D Mouse

[http://www.ascension-tech.com/products/6dmouse/]), or tracked joystick with multiple buttons

(Fakespace’s NeoWand [http://www.fakespacesystems.com/pdfs/FS_ss_NeoWand.pdf]) Interactions comprise of motions and/or button presses.

When standard devices fail to facilitate adequate interaction, custom-engineered solutions are often developed for specific tasks, enhancing user engagement through more intuitive interactions For instance, Hinckley et al augmented a doll's head with sliding rods and trackers, allowing doctors to select cutting planes for visualizing MRI data more effectively Similarly, military combat simulators utilize specialized buttons and trackers on replica firearms for training purposes While these tailored tools significantly improve interaction compared to traditional methods, their development can be time-consuming and typically limits usability to a narrow range of tasks.

Real Object Reconstruction

Introduction

The objective of integrating a physical object into a hybrid environment is to enable participants to interact with the real object—holding, moving, and using it—while simultaneously viewing a corresponding virtual representation within the digital scene.

There are two methods for creating virtual representations of real objects: offline modeling followed by online tracking and rendering, or real-time capture and rendering of the object's shape and appearance Our method utilizes the second approach, which necessitates the computation of new virtual representations of real objects at interactive speeds.

We introduce a cutting-edge real-time algorithm for visual hull computation of real objects, leveraging significant advancements in graphics hardware This algorithm, alongside the Image-Based Visual Hulls approach, marks a pioneering step in real-time object reconstruction Notably, it operates without the need for tracking real objects and is applicable for collision detection, as detailed in Chapter 4.

To effectively incorporate real objects into a virtual environment (VE), the first step involves capturing their shape and appearance to create a virtual representation We utilize the visual hull technique, a shape-from-silhouette method that analyzes the silhouettes of real objects from various angles to approximate their surface By projecting silhouette images, we define a volume that encompasses the real objects while excluding the surrounding space The intersection of these silhouette projections provides an accurate approximation of the object's shape This conservative visual hull approach ensures that any 3-D point inside the real object is also contained within its visual hull, effectively circumscribing the object.

Silhouette information alone is insufficient for accurately defining an object's surface geometry, particularly in cases of concavities like the interior of a cup Even with an infinite number of external views, silhouettes cannot capture these complexities The visual hull technique relies solely on silhouettes and neglects color information, which could aid in identifying convexity, correlations, and shadows, ultimately limiting its effectiveness in accurately reconstructing the true shape of an object.

Capturing Real Object Shape

The reconstruction algorithm processes images from multiple fixed-position live video cameras to detect newly introduced real objects in the scene through image segmentation It then generates a novel view of these objects' shapes by utilizing volume-querying techniques.

The image segmentation algorithm focuses on distinguishing foreground objects from a static background in camera images By utilizing the image subtraction technique with thresholds, we can effectively identify and extract the objects of interest Each camera captures a background image of the scene, allowing us to categorize pixels into two groups: object pixels for foreground items and background pixels for the static elements This process results in an object-pixel map that clearly segments the camera images into these two categories, facilitating the reconstruction of the desired foreground objects.

Equation 1 – High-level expression for image segmentation

(static background scene + foreground objects) – (static background scene) = foreground objects.

Camera images often contain noise, causing corresponding pixels in multiple images of a static scene to vary slightly in color due to mechanical and electrical factors Ignoring this color variability can lead to incorrect identification of pixels as part of a foreground object To manage this issue, a segmentation threshold can be applied, where each pixel in a new camera image is compared to its corresponding background pixel If the color difference exceeds a specified threshold, the pixel is classified as an object pixel (value set to 1); otherwise, it is classified as a background pixel (value set to 0) This process enhances accuracy in object detection within the images.

L i – Source camera image for camera i (x x y resolution) [pixels]

O i – Object-pixel map for camera i (x x y resolution) [pixels]

B i – Background image for camera i (x x y resolution) [pixels]

T i – Segmentation threshold map for camera i (x x y resolution) [pixels]

To effectively manage noise in static scenes, we implement per-pixel segmentation threshold values, creating a threshold map that reflects the statistical characteristics of the camera's background image noise Pixels associated with edges or high spatial frequency exhibit greater variation due to camera vibration, necessitating careful threshold calibration If the threshold is set too high, it may lead to the omission of important object pixels To mitigate this issue, we reduce high spatial frequency areas in background images by covering most surfaces with dark cloth.

Image segmentation can be significantly affected by factors such as shadows, lighting variations, and image noise For instance, changing the lighting conditions without acquiring new background images can lead to increased segmentation errors To mitigate this, we aimed to maintain consistent lighting throughout our process Instead of trying to identify or eliminate actual object shadows, we opted for diffuse lighting to ensure that shadows remained soft and less pronounced.

In the implementation of image segmentation, five frames of the static background are captured from each camera during initialization These images are averaged to create a background image, and the maximum deviation from this average is used to establish a per-pixel segmentation threshold map Our findings indicate that utilizing five images of the static background is adequate for generating effective background images and segmentation threshold maps.

In image segmentation, the current image is compared to a background image to identify object pixels by using a threshold While the object pixels are stored in the alpha channel, the visual representation of the image is enhanced by clearing the color components of the background pixels, allowing for a clearer view of the object pixels.

Image segmentation with thresholds is essentially the same as Chromakeying, a standard technique for separating foreground objects from a monochromatic background, used in television and movies.

The image segmentation process enhances the existing camera image by incorporating an object-pixel map within the alpha channel In this representation, object pixels are assigned an alpha value of 1, indicating full opacity, while background pixels have a lower alpha value, signifying transparency.

The volume-querying algorithm leverages object-pixel maps obtained from image segmentation to visualize the visual hull of real objects, as described by Laurentini in 1994 This technique aims to present the visual hull from a perspective that differs from any of the original camera viewpoints To achieve this, we employ a method known as volume-querying, which is a refined approach based on traditional volume definition techniques that utilize boundary representations, as outlined by Kutulakos in 2000.

Volume-querying asks, Given a 3-D point (P), is it within the visual hull (VH) of a real object in the scene?

P is within the visual hull iff for each camera i (with projection matrix Cm), P projects onto an object pixel (L i, j s.t O i,j = 1 (camera i, object pixel j).

~VHobject – (calculated) Visual hull of the real object

Camera i is characterized by its extrinsic parameters, including translation (a 3 x 1 vector) and rotation (a 3 x 3 matrix), as well as its intrinsic parameters, which encompass radial distortion (a scalar), principal point (a 2 x 1 vector), and focal lengths (a 2 x 1 vector) Additionally, the camera's resolution is represented as C s (x x y) The projection matrix, C m, is derived from the combination of the camera's extrinsic and intrinsic parameters, represented as a 4 x 4 vector.

P ∋ ~VH object iff ∀ i, ∋ j such that O i , j = C m, i *P , O i , j =1

To render the visual hull from a new perspective, we perform a volume query of the view frustum This process essentially determines which points within the novel view volume are encompassed by the visual hull.

Object pixels capture the projection of real-world objects onto a camera's image plane The visual hull is formed by the intersection of the three-dimensional projected right cones, which are defined by the two-dimensional object-pixel maps.

Figure 4 – The visual hull of an object is the intersection of the object pixel projection cones of the object.

To compute the intersection of object pixels from multiple cameras, each pixel's projected volume must be tested against the projected volumes of all other cameras With n cameras operating at a resolution of u x v, the computational complexity is (u*v)² * (n-1) The resulting reconstruction volume represents the intersection of all camera frusta, identifying the specific areas where object pixel intersections occur.

Using three NTSC cameras, the potential number of pyramid-pyramid intersection tests per frame can reach approximately 2.45 * 10^11, calculated as (720*486) multiplied by 2 * 2 This indicates that the number of intersection tests increases linearly with the number of cameras and quadratically with the resolution of each camera.

We enhance volume-querying efficiency by leveraging graphics hardware capabilities, including projected textures, alpha testing, and stencil testing, alongside depth and stencil buffers, as well as the frame buffer for intersection tests Our goal is to create a visual hull representation from the same viewpoint, direction, and field of view as the rendered virtual environment To achieve this for a u x v resolution viewport, we utilize standard graphics hardware components found in popular chipsets like the nVidia GeForce4, SGI Infinite Reality 3, and ATI Radeon.

• frame buffer – u x v array of color values of the first-visible surface of the visual hull Each element in the frame buffer has four values: red, green, blue, and alpha

• depth buffer – u x v array of depth values from the eye viewpoint to the first-visible surface of the visual hull

Capturing Real Object Appearance

To accurately capture the appearance of real objects from a participant's perspective, we utilized a lipstick camera with a mirror attachment mounted on a head-mounted display (HMD) This setup ensured that the camera's virtual viewpoint and direction closely aligned with that of the participant The images obtained from this camera were instrumental in texturing the visual hull, effectively addressing complex challenges related to visibility and lighting to compute the correct pixel colors.

When rendering from perspectives other than the participant's viewpoint, camera image data is utilized to color the visual hull However, our algorithm diverges from traditional modeling approaches, making the computation of color and visibility per pixel both costly and complex.

We employed two techniques to color the first visible surface of the visual hull The first technique involved blending camera textures during the plane sweeping process, where each texture was assigned a blending weight based on the angle between the camera's view direction and the plane's normal However, this method resulted in several texturing artifacts, including incorrect coloring, texture replication across multiple planes, and visible texture borders These issues arose from a lack of visibility computation, inadequate visual hull sampling, and discrepancies in shape between the actual object and the visual hull.

The second approach produced a coarse mesh of the reconstruction depth buffer, where the camera contributing most to a point's color is determined by the closest alignment of its view direction to the mesh's normal By comparing each mesh point's normal with the viewing directions of various cameras, the color of each vertex is assigned based on the camera whose view direction best matches its normal However, this method was slow and resulted in noticeable artifacts in the final output.

Both of our approaches fail to provide an adequate non-user viewpoint coloring solution The Image Based Visual Hulls algorithm developed by Matusik effectively computes the model and visibility, making it more suitable for reconstruction from perspectives beyond that of the participant.

Combining with Virtual Object Rendering

In the plane-sweeping step, planes are rendered and volume-queried within the same coordinate system as the virtual environment, ensuring accurate depth buffer values for the new viewpoint This approach effectively resolves occlusions between real and virtual objects by utilizing depth information from the viewer's perspective, allowing for a seamless visual composition of real-object avatars with the virtual environment.

Integrating real-object avatars into a virtual environment requires careful consideration of lighting and shading interactions This process begins by generating a polygon mesh from the reconstruction depth buffer values, which allows virtual lights to illuminate the real-object avatars Using OpenGL, the mesh is rendered, and the lit vertices are blended with the HMD camera texture for a seamless appearance Additionally, standard shadowing algorithms can be employed to enable virtual objects to cast realistic shadows on the real-object avatars, enhancing the overall immersive experience.

Shadows of real-object avatars on virtual objects are generated by reconstructing the real objects from the perspective of the light source This process creates a depth buffer, which is then transformed into a texture to effectively shadow virtual environment geometry.

Performance Analysis

The visual hull algorithm operates by combining the processes of image segmentation and volume querying However, this analysis excludes considerations related to the time and bandwidth expenses associated with capturing new images, transferring image data between processors, and rendering the virtual environment.

Image segmentation involves calculating object pixels by subtracting each pixel from a camera image from a corresponding background pixel This difference is then evaluated against a predefined segmentation threshold for every frame With n cameras operating at a resolution of u x v, the process necessitates u*v*n subtractions and comparisons.

Volume-querying tasks involve both graphics transformation and fill rate demands For n cameras rendering l planes at u x v resolution, with an i x j grid for camera-distortion correction, the geometry transformation requires (2(n*i*j)+2)*l triangles per frame Each plane undergoes u * v point volume-queries in parallel, and since every pixel is rendered n+1 times per plane, the fill rate is significantly impacted.

The performance of geometry transformations per frame is influenced by the number of camera planes and grid size The SGI Reality Monster is capable of transforming approximately 1 million triangles per second, while the nVidia GeForce4 significantly outperforms it with the ability to transform around 75 million triangles per second.

The fill rate, measured in pixels per second, varies significantly between different graphics technologies For instance, the SGI Reality Monster achieves a fill rate of approximately 600 million pixels per second, while the nVidia GeForce4 boasts a higher fill rate of about 1.2 billion pixels per second This performance disparity highlights the advancements in graphics processing capabilities.

To reconstruct a one-meter deep volume with 1-centimeter spacing between planes using three NTSC input cameras at 30 Hz, a single field in a 320 x 240 window processes at 15 frames per second This setup performs 15.7 million image segmentation subtracts and threshold tests per second, transforms 0.23 million triangles per second, and requires a fill rate of 460 million per second.

The SGI Reality Monster can transform about 1.0 * 10 6 triangles per second and has a fill rate of about 0.6

The nVidia GeForce4 graphics card is capable of processing approximately 75 million triangles per second and boasts a fill rate of around 1.2 billion pixels per second However, the fill rate requirements impose constraints on the number of planes used for volume sampling, subsequently affecting the accuracy of the reconstruction process.

At 320 x 240 resolution with 3 cameras and reconstructing at 15 frames per second, on the SGI, we estimate one can use 130 planes, and on a GeForce4, 261 planes.

Accuracy Analysis

The accuracy of a virtual representation of a real object is determined by two key factors: the fidelity of the shape and the resemblance of the appearance.

The main source of error in capturing the shape of a real object compared to its digital avatar stems from the visual hull approximation This approach inherently introduces a baseline level of error in shape representation, which cannot be mitigated by addressing other error sources The discrepancies between the visual hull and the actual object shape are discussed in detail in [Niem97] For instance, a sphere with a diameter of 10 cm can exhibit significant differences when analyzed through this lens.

Three cameras positioned 2 meters apart along the three primary axes can capture a point 1.26 cm outside the sphere while still remaining within the sphere's visual hull However, for objects with convex shapes or limited camera views, the potential error in capturing these objects increases significantly.

We now consider the sources of error for the rendered shape of the visual hull of a real object The shape,

I final , is represented by a sample point set in 3-space, located on a set of planes.

The final image of the visual hull is derived from three key components: image segmentation (as detailed in Equation 2), volume-querying (explained in Equation 3), and visual hull sampling (outlined in Equation 4).

Equation 6 - Novel view rendering of the visual hull

I final – Novel view of the visual hull of the real object

P – a 3-D point (3 x 1 vector) [meters] p – 2-D projection of P (2 x 1 vector)

PS – Spacing between planes for plane sweep volume-querying [meters]

U – User’s pose (Tracker report for position and orientation, field of view, near plane, far plane),

U m is the projection matrix defined by the user’s pose.

S – Novel view screen resolution (u x v) [pixels]

Camera i is characterized by its extrinsic parameters, including translation (a 3x1 vector) and rotation (a 3x3 matrix), as well as its intrinsic parameters, which encompass radial distortion (a scalar), principal point (a 2x1 vector), and focal lengths (a 2x1 vector) Additionally, the camera's resolution is represented as C s (x x y) The projection matrix, denoted as C m, is a 4x4 vector that integrates both the extrinsic and intrinsic parameters of the camera.

L i – Source camera image for camera i (x x y resolution) [pixels]

O i – Object-pixel map for camera i (x x y resolution) [pixels]

B i – Background image for camera i (x x y resolution) [pixels]

T i – Segmentation threshold map for camera i (x x y resolution) [pixels]

There are three kinds of error for I final , errors in shape, appearance, and location

Image Segmentation Here is the equation for image segmentation again (Equation 2) For pixel j, camera i

The errors in the image segmentation for a pixel come from three sources:

1) The difference in foreground object color with the background color is smaller than the segmentation threshold value

2) The segmentation threshold value is too large, and object pixels are missed – commonly due to high spatial frequency areas of the background

3) Light reflections and shadowing cause background pixels to differ by greater than the segmentation threshold value.

The incorrect segmentation of pixels results in the following errors of visual hull size:

1) Labeling background pixels as object pixels incorrectly increases the size of the visual hull

2) Labeling object pixels as background pixels incorrectly reduces the size of the visual hull or yields holes in the visual hull.

Errors in image segmentation do not contribute to errors in the visual hull location.

In our study, we effectively minimized segmentation threshold values by draping dark cloth over most surfaces, which reduced high spatial frequency areas while maintaining consistent and diffuse lighting We incorporated foreground objects that contrasted significantly in color with the background, utilizing Sony DFW-500 cameras that exhibited about 2 percent color variation in the static cloth-draped scenes Additionally, we discovered that establishing minimum and maximum segmentation thresholds per camera, determined through empirical testing, significantly decreased image segmentation errors.

In this article, we explore volume-querying with the assumption that camera pixels are rectangular and experience only radial distortions, without any higher-order distortions We present the camera model equation (Equation 5) alongside the volume-querying equation (Equation 3) to illustrate these concepts.

_ _ pp f d pp f d pp f pp f f final n n r d d distorted n n projection pinhole normalized t r projected

The accuracy of the virtual volume that an object pixel occupies is influenced by how well it corresponds to the physical space volume, which is determined by the inverse of the camera matrix (Cm -1) This matrix is essential for projecting pixels from the camera's image plane into world rays and is defined by the camera's extrinsic parameters, including translation (a 3 x 1 vector) and rotation (a 3 x 3 matrix), as well as intrinsic parameters such as radial distortion (a scalar), principal point (a 2 x 1 vector), focal lengths (a 2 x 1 vector), and resolution (C s, represented as x by y).

In a camera setup with a 1 cubic meter reconstruction volume, the accuracy of volume querying is significantly influenced by camera rotation and resolution Even a slight rotational error, such as 1 degree, can lead to substantial discrepancies, resulting in an error of up to 5.75 cm in the reconstructed volume This highlights the sensitivity of projecting 3-D points onto the 2-D camera image plane, emphasizing the importance of precise camera positioning for accurate volume reconstruction.

Camera resolution plays a crucial role in determining the minimum size of foreground objects that can be visualized, as the undistorted 2-D projection of a 3-D point is rounded into integers referencing the camera’s object-pixel map, introducing potential errors in volume-querying Our camera setup ensures that the maximum distance from any camera to the farthest point in the reconstruction volume is 3.3 meters Utilizing NTSC-resolution cameras (720 x 243) with 24-degree FOV lenses, each pixel covers a pyramidal volume with a base measuring up to 0.58 cm by 0.25 cm.

Errors in camera calibration significantly impact the shape of the visual hull, with the primary influence stemming from inaccuracies in camera rotation This rotational error projects into the volume, establishing a minimum certainty for any queried point within that volume Additionally, quantifying the effect on the visual hull's location proves to be more complex.

Errors in camera calibration can lead to discrepancies where object pixels project a volume that does not accurately correspond to the physical space This misalignment occurs as the pixels extend from the camera's image plane through the lens into the surrounding volume.

An error in camera calibration can cause a shift in the projection of an object pixel; however, it does not alter the position of the visual hull The inaccurate volume being captured is unlikely to overlap with the object pixel projections from other cameras, resulting in a reduction of the visual hull's size without any movement.

When three cameras capture a 7.5 cm cube, a 0.1-degree rotation error in one camera positioned 2 meters above the cube can lead to a visual hull size reduction of approximately 4 mm This calibration error in one camera's projection is unlikely to coincide with the projections from the other cameras, meaning the visual hull's location remains largely unchanged For the visual hull to shift, all cameras would need to exhibit calibration errors that align in the same world space direction.

To accurately determine the extrinsic parameters of our cameras, we positioned them as close as possible to the working volume and utilized the UNC HiBall system attached to a stylus for digitizing camera locations and scene points The HiBall provides sub-millimeter accuracy for position and 0.1-degree accuracy for rotation, resulting in an estimated error of about 1 pixel for rotation and sub-millimeter for position For internal parameter estimation, we captured an image of a checkerboard pattern filling the camera's field of view and used the stylus to digitize specific points on the checkerboard By overlaying these digitized points on the captured image and manually adjusting the intrinsic parameters, we achieved undistorted image data with an error of approximately 0.5 cm for checkerboard points in the reconstruction volume.

Implementation

Hardware We have implemented the reconstruction algorithm in a system that reconstructs objects within a 5-foot x 4-foot x 3-foot volume above a table top as shown in Figure 7.

Figure 7 – The overlaid cones represent each camera's field of view The reconstruction volume is within the intersection of the camera view frusta.

The system is equipped with three NTSC cameras (720 x 486 resolution) mounted on the wall and one camera (640 x 480 resolution) attached to a Virtual Research V8 HMD The cameras are positioned with one directly overhead, one to the left of the table, and another diagonally placed about three feet above the table However, the camera placement is not ideal, as the angles between their view directions are not maximized due to constraints related to lab space and maintainability.

The system begins by capturing and averaging five images from each camera to create background images Initially, we attempted to use one image per camera for each field, but this led to visual hulls exhibiting bands of shifted volumes due to interlaced textures in dynamic real objects Our second method involved capturing one background image per field, resulting in stationary objects appearing to jitter because the visual hulls from alternating fields were not identical Ultimately, we adopted a simpler approach by consistently using field zero, which, despite increasing reconstruction error, reduced latency and minimized shearing in dynamic real objects.

The UNC HiBall, a scalable wide-area optical tracker, is utilized to track participants, as illustrated in Figure 8 [Welch97] Additionally, the image features a camera and mirror fixture mounted on the HMD, which are employed to enhance the texturing of the reconstruction.

Figure 8 – Virtual Research V8 HMD with UNC HiBall optical tracker and lipstick camera mounted with reflected mirror.

The four cameras are linked to Digital In – Video Out (DIVO) boards on an SGI Reality Monster system While PC graphics cards can manage algorithm transformations and pixel fill loads, the SGI system's superior video input capabilities, multiple processors, and high memory-to-texture bandwidth made it the preferred choice during the initial development phase.

Over the past two years, various multi-camera algorithms have been successfully implemented on a dedicated network of commodity PCs equipped with Firewire cameras The advancement of PC memory and video card texture bandwidth through AGP 8X has made transitioning to PC-based systems a practical option These systems offer advantages such as shorter development cycles, faster upgrades, and the ability to integrate new hardware features Additionally, the enhanced processing power allows for efficient operations like image segmentation.

The SGI utilizes multiple graphics pipelines, specifically five, to enhance rendering and simulation processes A parent pipe assembles reconstruction results and renders the virtual environment (VE), while a dedicated video pipe captures and broadcasts camera images Two reconstruction pipes are employed for volume-querying, asynchronously processing the latest camera images through image segmentation and volume intersection before relaying their findings to the parent pipe The number of reconstruction pipes is a balance between reducing latency and increasing frame rate, both of which improve with additional pipes Additionally, a simulation pipe conducts virtual simulations, including rigid-body and cloth dynamics, and executes collision detection and response tests All outcomes are integrated into the parent pipe, which ultimately renders the VE with the reconstruction data, utilizing multiple processors for tasks like image segmentation.

The reconstruction process is performed within a 320 x 240 window to minimize fill rate demands The outcomes are then scaled to 640 x 480, aligning with the resolution used in VE rendering The Virtual Research V8 HMD supports a maximum resolution of 640 x 480 at a refresh rate of 60 Hz.

The reconstruction system operates efficiently at 15-18 frames per second for planes spaced 1.5 centimeters apart, extending to a depth of approximately 0.7 meters, encompassing around 50 planes within the novel view volume Image segmentation accounts for roughly half of the frame computation time, while the reconstruction process achieves a rate of 22-24 frames per second Additionally, the system can handle a geometric transformation rate of 16,000 triangles per second and a fill rate of 1.22 billion pixels per second, with an estimated latency of about 0.3 seconds.

The reconstruction result is equivalent to the first visible surface of the visual hull of the real objects, within the sampling resolution (Figure 9).

The screenshot from our reconstruction system showcases a model of the participant seamlessly integrated with virtual objects, highlighting accurate occlusion between the participant's real hand and the virtual teapot handle.

The hardware-accelerated reconstruction algorithm leverages advancements in graphics hardware, enabling efficient detection of intersections between virtual models and real-object avatars Further details on this topic are explored in Chapter 4.

To optimize efficiency, it's essential to focus on specific areas of real space that can influence the final image Therefore, only the points within the participant's view volume were analyzed.

Participants are encouraged to engage with real objects while interacting with a hybrid virtual environment featuring a virtual faucet and particle system Their avatars cast shadows on virtual items and interact with the water particles, leading to spontaneous behaviors such as cupping hands to catch water, positioning objects to observe particle flow, and humorously attempting to drink the synthetic water This intuitive interaction, free from extra tracking devices, fosters uninhibited exploration and often results in unexpected actions from the participants.

Sampling large volumes using planes presents a significant challenge, leading to a complexity of O(n^3) This necessitates a tradeoff between sampling resolution and performance, especially for substantial volumes Our findings indicate that a 1.5-centimeter resolution for novel view volumes with a depth of 1 meter allows for real-time reconstruction speeds, delivering adequate quality for tabletop applications.

Visibility, which involves accurately assigning colors to pixels while considering obstructions from source cameras, poses challenges for hardware-based algorithms However, when utilizing a head-mounted display (HMD) camera for high-resolution texture mapping, this issue is mitigated as we focus on a first-person perspective of real objects In applications such as event replay or multi-user virtual environments (VEs), addressing visibility is crucial While methods like blended textures or textured depth-meshes can produce coloring artifacts, the IBVH approach developed by Matusik effectively computes both the model and visibility by tracking the contributions of source images to each final pixel result.

In this chapter, we introduced a hardware-accelerated algorithm designed to accurately capture the shape and appearance of real objects These virtual representations were integrated with virtual objects for rendering The subsequent chapter will discuss algorithms developed to handle collisions between these virtual representations and other virtual entities.

Collision Detection

Overview

Collision detection and response algorithms, combined with advanced lighting and shadowing techniques, facilitate the integration of real objects into hybrid environments This integration allows real-world items to serve as dynamic inputs in simulations, creating a seamless and intuitive interaction with virtual elements Consequently, users can engage with virtual objects as if they were part of a fully realistic environment.

In addition to visually incorporating real objects into our hybrid environments, we aim for real-object avatars to influence the virtual elements of these settings.

In a virtual environment, a participant's avatar interacts with a simulated world by parting a digital curtain to gaze through a virtual window Each time-step of the simulation updates the cloth dynamics based on collision data between virtual objects and real-object avatars, enhancing the realism of the experience.

In a virtual environment, a participant interacts by parting virtual curtains to gaze out a window The detection of collisions between the virtual curtains and the participant's real-object hand avatars serves as crucial input for the cloth simulation.

This chapter explores algorithms designed to enhance the interaction between real objects and virtual environments by focusing on collision detection and realistic response mechanisms It emphasizes the importance of accurately simulating lighting, shadowing, and physics to create a seamless integration of real-object avatars with virtual objects.

The interaction between a real hand and virtual cloth begins with collision detection, which identifies when the hand comes into contact with the cloth This initial step is crucial as it gathers the necessary information for the application to simulate a realistic response from the cloth, ensuring an accurate and immersive experience.

We define interactions, as one object affecting another Given environments that contain both real and virtual objects there are four types of interactions we need to consider:

Collisions between real objects are governed by the laws of physics, where energy transfers during the impact can result in movement, deformation, and changes in direction of the involved objects.

• Virtual-virtual: collisions between virtual objects are handled with standard collision detection packages and simulations determine response.

We introduce a novel image-space algorithm designed to detect the intersection of virtual objects with the visual hulls of real objects, addressing the interactions between real and virtual entities This algorithm not only identifies these intersections but also provides essential data for simulations to rectify any unnatural interpenetration between the two Our approach enhances the volume-querying technique discussed in Chapter 3, ensuring a more realistic integration of real and virtual elements.

• Virtual-real: We do not handle the case of virtual objects affecting real objects due to collisions.

• Primary rule: Real-object avatars are registered with the real objects.

• Virtual objects cannot physically affect the real objects themselves We do not use any mechanism to apply forces to the real object.

• Therefore, virtual objects are not allowed to affect the real-object avatars’ position or shape.

• Corollary: Whenever real-object avatars and virtual objects collide, the application modifies only the virtual objects.

Visual Hull – Virtual Model Collision Detection

Standard collision detection algorithms rely on geometric models to identify collisions among objects However, our system diverges from this approach by not explicitly creating a geometric model of the visual hull during reconstruction Instead, we developed new algorithms that utilize camera images of real objects to detect collisions between real-object avatars and virtual entities The visual avatar algorithm described in Chapter 3 does not construct a complete model of real objects; it focuses on volume queries within the participant’s view frustum Similarly, our collision algorithm checks for collisions by performing volume queries with the primitives of virtual objects.

Our real-virtual collision detection algorithm utilizes a set of n live video camera images alongside virtual objects defined by geometric boundary representations, primarily focusing on triangle boundary representations Triangles are favored due to their prevalence in virtual object representation and the optimization of graphics hardware for triangle transformation and rendering While the algorithm can be adapted for other geometric representations, it is typical to decompose these into triangles for efficiency.

The outputs of the real-virtual collision detection algorithm are:

• Set of points on the boundary representation of the virtual object in collision with a real- object avatar (CP i )

The outputs of the collision response algorithm are estimates within some tolerance for:

• Point of first contact on the virtual object (CP obj ).

• Point of first contact on the visual hull (CP hull ).

• Recovery vector (V rec ) along which to translate the virtual object to move it out of collision with the real-object avatar.

• Distance to move the virtual object (D rec ) along the recovery vector to remove it from collision.

• Surface normal at the point of first contact on the visual hull (N hull ).

Assumptions A set of simplifying assumptions makes interactive-time real-virtual collision detection a tractable problem

Assumption 1: Only virtual objects can move or deform as a consequence of collision This follows from our restrictions on virtual objects affecting the real object The behavior of virtual objects is totally under the control of the application program, so they can be moved as part of a response to a collision We do not attempt to move real objects or the real-object avatars.

Assumption 2: Both real objects and virtual objects are considered stationary at the time of collision

Collision detection relies solely on the position data captured at a specific moment, with real-object avatars recalculated for each frame This process does not retain any information, such as the centroid of the visual hull, between frames As a result, the real-virtual collision detection algorithm lacks any data regarding the motion of real objects or their hulls.

Due to Assumption 2, the algorithm cannot ascertain the nature of the collision between real and virtual objects, making it impossible to identify the precise vector needed to reposition the virtual object to its original location at the moment of impact Instead, our algorithm provides a method for moving the virtual object away from the collision.

Assumption 3: There is at most one collision between a virtual object and the real object visual hull at a time If the real object and virtual object intersect at disjoint locations, we apply a heuristic to estimate the point of first contact This is due to our inability to backtrack the real object to calculate the true point of first contact For example, virtual fork tines penetrating the visual hull of a real sphere would return only one estimated point of first contact We move the virtual object out of collision based on our estimate for the deepest point of collision.

Assumption 4: The real objects that contribute to the visual hull are treated as a single object

The real-object avatar may visually consist of separate volumes, such as two hands, but it is represented computationally by a single visual hull encompassing all objects in the scene This system treats multiple real objects as one during collision detection; for instance, the real oil filter and the user's hand create a unified visual hull While this approach suffices for scenarios like assessing a mechanic's ability to navigate an engine, distinguishing between individual real objects may be essential for other applications.

Assumption 5: We detect collisions shortly after a virtual object intersects and enters the visual hull, and not when the virtual object is exiting the visual hull This assumes the frame rate is fast compared to the motion of virtual objects and real objects The consequence is that moving the virtual object along a vector defined in our algorithm will approximate backing the virtual object out of collision This assumption might be violated, for example, by a virtual bullet shot into a thin sheet of real plywood.

Approach There are two steps for managing the interaction of virtual objects with real-objects avatars

The primary task involves assessing whether a virtual object, characterized by geometric primitives that define its surface, collides with a real object, which is computationally represented by its visual hull volume.

In a collision between a virtual object and a real object, it is essential to minimize or eliminate unnatural penetration While the simulation provides detailed information about the virtual object, including its velocity, acceleration, and material properties, such data is unavailable for the real object Consequently, our algorithm does not utilize this information, as we do not track or model the real object, which is treated as an occupied volume by the reconstruction system.

Determining the exact time and points of first collision for both virtual and real objects is not feasible In the event of a collision, understanding the mechanics behind the interaction is also unattainable, leading us to focus solely on rectifying any erroneous interpenetration Our primary aim is to estimate the position and initial contact point of the objects involved Once this is established, it becomes relevant to incorporate additional data, such as the contact normal or the velocity of the virtual object, to achieve more accurate and realistic collision responses.

In Figure 11, the process of identifying collision points between real objects, such as a hand, and virtual objects, like a teapot, is illustrated Each triangular primitive of the teapot undergoes a volume query to ascertain the locations of the virtual object within the visual hull, represented by the blue points.

The algorithm begins by detecting collisions and subsequently samples points on the surface of the virtual object that fall within the visual hull, identified as collision points (CP i), represented as blue dots in Figure 11 From these collision points, it selects one point, known as the virtual object collision point (CP obj), which is the farthest from a reference point (RP obj), typically the center of the virtual object, illustrated as a green dot in Figure 11.

To resolve the collision involving the CP object, we need to determine a recovery vector, V rec, that points from the CP object to the RP object This vector intersects the visual hull at the collision point, known as CP hull, and helps establish the necessary distance to move the CP object out of the collision.

The algorithm calculates the distance between the control point (CP) object and the control point hull (CP hull) to determine how to move the CP object along the visual representation Additionally, it computes the normal to the visual hull at the CP hull when required The subsequent sections detail the methods used to derive these values, focusing on the collision detection and response between a single virtual object and a single real object.

Performance Analysis

Given n cameras and virtual objects with m triangles and testing each triangle in a u x v resolution viewport in a x x y resolution window, the geometry transformation cost is (n * m) per frame The fill rate cost is

In our hybrid environment, we utilized three cameras to manage 720 triangles that formed the curtains, as illustrated in Figure 16 For collision detection, we implemented a 10 x 10 viewport within a 400 x 400 window, resulting in a computational cost of (n*m*u*v)/2 and (x*y) pixel readbacks to identify colliding pixels This approach achieved a collision detection rate of 6 frames per second, with a geometry transformation cost of (3 * 720).

* 6 Hz = 12960 triangles per second The fill rate is (3*720*10*10)/2 * 6 Hz = 648000 pixels per second There are also 160,000 pixel readbacks and compares.

For collision response, the transformation cost is 2 triangles per virtual object in collision The fill rate is (x * y * n) = (400 * 400 * 3) = 480,000 pixels per collision.

Our initial implementation of the collision detection and response algorithm can process approximately 13,000 triangles per second, despite not being fully optimized This performance is significantly below the theoretical limits achievable on SGI systems, indicating that further optimizations—particularly in reducing OpenGL state changes—could enhance the algorithm's efficiency.

Accuracy Analysis

Collision detection accuracy is influenced by image segmentation, camera models, and viewport size Chapter 3.6 discusses the error analysis related to image segmentation and camera models affecting the visual hull's precision This section focuses on how viewport size impacts collision detection accuracy; specifically, a larger viewport results in more closely spaced points on the triangle being volume-queried Therefore, the spatial accuracy of collision detection is determined by the product of viewport resolution (u x v) and the bounding box size (x x y) of the triangle in world space.

In a square viewport, where the horizontal and vertical dimensions are equal (u = v), collision detection accuracy is determined by the ratio of the triangle's dimensions to the viewport size Specifically, this accuracy is calculated as x / u by y / u, as each triangle is projected to optimally fill the viewport, ensuring that approximately half of the pixels are occupied by the triangle Thus, the accuracy is based on the two longest dimensions of the triangle divided by the viewport's horizontal size.

Rendering primitives at higher resolutions in larger viewports enhances collision detection by increasing the number of closely spaced collision points and reducing errors However, larger viewports may limit the number of triangles that can be queried within the collision detection window If the primitives cannot be assigned individual viewports within a single frame buffer, multiple volume-query and read-back cycles will be necessary to test all triangles effectively.

In establishing the optimal level of parallelism, a speed-accuracy tradeoff emerges: increasing the number of viewports enhances algorithm execution speed but reduces the pixel resolution for collision calculations of each primitive While smaller viewports allow for greater parallelism, they may lead to missed collisions.

In the context of tabletop-sized virtual objects, the dimensions of individual triangles typically average about 2 cm per side of the bounding box, resulting in a collision point detection error of 0.2 cm by 0.2 cm For instance, in the case of the virtual sphere illustrated in Figure 15, these size parameters play a crucial role in ensuring accurate interaction within the virtual environment.

The study involved 252 triangles with a radius of 10 cm, where the average bounding box size for each triangle was 1.3 cm by 1.3 cm, enabling collision detection at a resolution of 0.13 cm This resolution is finer than the errors associated with visual hull location and shape Additionally, the cloth system featured nodes spaced 7.5 cm by 3 cm apart, resulting in a collision detection resolution of 0.75 cm by 0.3 cm These measurements represent the spatial frequency for volume-querying and indicate the maximum error in identifying collision points.

In collision response, we analyze the calculation of the CP hull point, which influences the distance along the recovery vector, D rec, necessary to reposition the virtual object Additionally, we consider the uncertainty associated with the N hull vector The accuracy in determining the CP hull along the V rec is affected by the resolution of the collision response window and the length of the major axis of triangle ABC, measured in meters.

In our analysis, we consider a square window, which is generally equivalent to the collision detection window The accuracy for detecting the convex hull of collision points (CP hull) is defined as l/x Based on Assumption 5, which states that our frame rate is comparable to the motion of the objects, we establish an initial interpenetration limit of 5 cm, indicating that the maximum allowable overlap between objects is 5 cm.

The 400 x 400 window introduces a detection error of 0.0125 cm for the CP hull If penetration exceeds 5 cm, we double the size of triangle ABC and re-query the volume Notably, these errors are significantly smaller than those associated with the visual hull's location and shape.

The surface normal at the CP hull, denoted as N hull, is determined by calculating the cross product of surface points located a short distance from the CP hull The accuracy of these points in representing the surface at the CP hull is influenced by factors such as surface topology, the distance of these points from the CP hull, the distance from the CP hull to the CP object, and the errors associated with volume-querying detection, which has a detection error of 0.0125 cm.

We estimate the errors in collision detection and response values, independent of visual hull shape and location inaccuracies Our assumptions include a virtual triangle size of 2 cm, 10 x 10 viewports, and a window resolution of 400x400.

Collision points (CP i )– 0.75 cm error

Point of first contact on the virtual object (CP obj )– 0.75 cm error

Point of first contact on the visual hull given the collision points (CP hull ) – 0.0125 cm error

Distance along recovery vector to move virtual object along – 0.0125 cm error

Algorithm Extensions

Figure 16 – Sequence of images taken from a VE where the user can interact with the curtains to look out the window

The article discusses a collision detection algorithm applied to deformable virtual objects, specifically cloth, as illustrated in a sequence of frames showing a user pushing aside a curtain The algorithm operates by independently detecting collisions for each triangle in a mesh representing the cloth, calculating recovery vectors and distances that are then used as displacement vectors for the cloth simulation nodes To ensure the curtains only translate horizontally, the algorithm defines a motion direction rather than using the object's center, allowing for constrained motion that aligns with the specified direction when resolving collisions.

The vector from the CP object to the RP object serves as the most probable estimate for how the virtual object interacted with the visual hull It is important to note that the object center does not always have to correspond to the RP object in use For instance, in the case of highly symmetrical objects like spheres, we can utilize the distance to the object center from the previous time-step as the RP object.

Volume-querying can utilize various primitives beyond surface boundaries, including distance fields, to calculate data This proximity information can be represented visually, such as thermal radiation from real objects interacting with virtual ones, magnetic fields generated by real entities, or obstacles within a motion planning simulation.

The depth buffer generated from novel-viewpoint reconstruction can be transformed into a polygonal mesh, which we have integrated as collision objects within a particle system Upon completing each reconstruction, the updated surface is sent to the particle system As illustrated in Figure 17, a water particle system interacts with a user holding a real plate, with both the user and the plate reconstructed from an overhead viewpoint The resulting depth buffer is then utilized by the water particle system to enhance the interaction.

In the virtual environment, the real-object avatars of the plate and user serve as collision surfaces for the particle system, allowing for interactive elements The hand and plate not only cast realistic shadows but also engage dynamically with the water particles.

User Study

Purpose

This study aimed to examine how interaction methodologies and the visual fidelity of avatars influence task performance and the sense of presence during cognitive manual tasks Our motivation stemmed from a desire to understand the elements that contribute to the effectiveness of virtual environments and to assess a new system designed for natural interactions with visually accurate avatars.

The real-time object reconstruction system enables the assessment of how interaction with real objects and visually accurate avatars influences task performance and user presence Traditionally, studying these interactions was challenging due to the complexities associated with conventional modeling and tracking methods.

Our system allows us to examine the impact of interacting with real versus virtual objects on cognitive task performance, specifically focusing on completion time These findings will be beneficial for training and assembly verification applications, where users frequently encounter problem-solving scenarios involving tools and components.

Our system allows us to explore the impact of visually faithful avatars versus generic avatars on the sense of presence in immersive virtual environments The findings will shed light on the importance of investing in high-fidelity visual avatars, benefiting designers of applications like phobia treatment and entertainment that seek to enhance participant engagement and presence.

Background The Effective Virtual Environments (EVE) research group at the University of North

Carolina at Chapel Hill is conducting foundational research to determine the key components that make virtual environments (VEs) effective, building on previous studies about the impact of passive haptics on presence and learning The EVE group's prior research has explored various factors, including physiological measures of sense-of-presence, the influence of static haptics, locomotion, and field of view rendering on presence, learning, and task performance in VEs Common metrics for assessing the effectiveness of these environments include task performance, sense-of-presence, learning outcomes, and both behavioral and physiological measures.

The Virtual Environments and Computer Graphics research group at University College London, led by Mel Slater, has conducted extensive user studies revealing that avatars enhance the self-reported sense of presence among users They propose that visually faithful avatars, as opposed to generic ones, could further elevate this sense of presence Additionally, researchers Heeter and Welch observed that the use of an avatar significantly improved immersion in virtual environments, leading them to hypothesize that a visually accurate avatar would provide even greater enhancements.

We are interested in determining whether performance and sense-of-presence in VEs with cognitive tasks would significantly benefit from interacting with real objects rather than virtual objects.

Virtual environments (VEs) serve as valuable tools for training, simulation, and experimentation in high-cost or hazardous tasks They allow users to efficiently examine, modify, and evaluate various virtual designs, significantly reducing both time and expenses compared to creating physical prototypes While researchers acknowledge that training with real objects is generally more effective, the question remains: how much does interaction with real objects enhance the learning experience? It is essential to assess whether the effectiveness of engaging with real objects justifies the associated costs.

Task

In our design decisions, we aimed to generalize tasks typical in virtual environment (VE) design applications, ensuring our findings are relevant across various VEs Our survey of production VEs revealed that many VE objectives focus on participants engaging in spatial cognitive manual tasks.

We use the following definition for spatial tasks:

Spatial ability encompasses three key dimensions: spatial orientation, which involves the mental movement or transformation of stimuli; spatial visualization, where individuals manipulate objects using themselves as a reference point; and spatial relations, which focuses on understanding and manipulating the relationships within an object.

Training and design review tasks executed in VEs typically have spatial components that involve solving problems in three dimensions.

“Cognition is a term used to describe the psychological processes involved in the acquisition, organisation and use of knowledge – emphasising the rational rather than the emotional characteristics” [Hollnagel02].

The VE applications we seek to explore often incorporate a substantial cognitive element, particularly evident in layout applications where users assess various configurations and designs These tasks primarily engage spatial and cognitive abilities rather than relying on motor skills or emotional judgments, similar to those found in widely used intelligence assessments.

We specifically wanted to use a task that involves cognition and manipulation while avoiding tasks that primarily focus on participant dexterity or reaction speed for the following reasons:

Controlling participant dexterity variability poses challenges, as it may overshadow interaction in the assessment measures Therefore, it is crucial to choose a task that requires simple and easily comprehensible physical movements to effectively achieve a cognitive outcome.

Assembly design and training activities conducted in virtual environments (VEs) typically lack a significant focus on dexterity and reaction speed Most immersive virtual environments tend to steer clear of tasks that require perceptual motor skills.

• VE technical limitations on interactions would limit many reaction speed-based tasks For example, a juggling simulator would be difficult to develop, test, and interact with, using current technology.

Measuring outcomes can be significantly influenced by factors like tracking error, display resolution, and variations in human dexterity, which are often constrained by technical limitations Identifying all the key interactions and confounding variables presents a considerable challenge.

The task we designed is similar to, and based on, the block design portion of the Wechsler Adult

The Wechsler Adult Intelligence Scale (WAIS), developed in 1939, is a widely recognized test for measuring intelligence quotient (IQ) It consists of two primary components: verbal and performance, each containing various subsections like comprehension, arithmetic, and picture arrangement Notably, the block-design component evaluates reasoning, problem-solving skills, and spatial visualization, forming an essential part of the performance subsection.

In the traditional WAIS block design task, participants engage with small one-inch plastic or wooden cubes to replicate target patterns, which consist of either four or nine blocks Each cube features two all-white faces, two all-red faces, and two faces that are half-white and half-red, divided diagonally The inclusion of borders around the target patterns can influence the task's difficulty level, impacting how participants approach the design challenge.

The WAIS test evaluates a participant's ability to accurately replicate patterns, awarding additional points for speed Each target pattern has a specific time limit that varies according to its difficulty and size.

We could not directly utilize the block design subtest of the WAIS for two main reasons Firstly, the WAIS test and its patterns are protected by copyright, while our user study patterns are original designs Secondly, instead of presenting a sequence of patterns with gradually increasing difficulty, we opted for a randomized order of patterns that were determined to be of relatively equal difficulty through pilot testing.

The small one-inch cube block size of the WAIS poses challenges for manipulation through virtual methods due to camera resolution and reconstruction accuracy limitations To address these issues, we increased the block size to a three-inch cube, enhancing the effectiveness of the reconstruction system.

Figure 18 – Image of the wooden blocks manipulated by the participant to match a target pattern.

Participants engaged in a task involving the manipulation of nine identical 3-inch wooden blocks to replicate a specific target pattern on the top face of the blocks Each cube featured six distinct quadrant-divided patterns in white and blue, as illustrated in Figure 18.

The experiment utilized two sizes of target patterns: small four-block patterns arranged in a 2x2 format and larger nine-block patterns organized in a 3x3 layout For a visual reference, see Appendix A.9, which displays the patterns employed in the study.

In our study, we examined two dependent variables: task performance and sense of presence Task performance was measured by the time, in seconds, it took participants to arrange blocks to match a target pattern, comparing results between a baseline condition in the real world and a virtual environment (VE) condition Additionally, sense of presence was evaluated through scores from a questionnaire administered after the experience, providing insights into participants' immersion levels.

The user study utilized a between-subjects design, where participants engaged in tasks within a real space environment (RSE) followed by one of three virtual environment conditions The independent variables examined were the interaction modality, comparing real blocks to virtual blocks, and the fidelity of avatars, distinguishing between generic and visually faithful representations.

• Virtual objects with generic avatar (purely virtual environment - PVE)

• Real objects with generic avatar (hybrid environment - HE)

• Real objects with visually faithful avatar (visually-faithful hybrid environment – VFHE)

Final Study Experiment Conditions

Participants completed a block pattern-matching task in a real space environment (RSE) without virtual equipment, followed by the same task in one of three virtual environments: purely virtual (PVE), hybrid (HE), or visually-faithful hybrid (VFHE) Figure 19 illustrates the various environments in which participants engaged in the block-manipulation task.

Figure 19 – Each participant performed the task in the RSE and then in one of the three VEs.

The participants were randomly assigned to one of the three groups, 1) RSE then PVE, 2) RSE then HE, or

In the Real Space Environment (RSE), participants were positioned at a desk with a rectangular enclosure measuring 36” x 25” x 18”, containing nine wooden blocks The enclosure, which had an open side facing the participant, was covered with a dark cloth and illuminated by two small lights inside.

Figure 20 – Real Space Environment (RSE) setup The user watches a small TV and manipulates wooden blocks to match the target pattern.

A 13-inch television positioned on top of the enclosure showcased the video feed from a "lipstick camera" installed inside, mirroring the participant's line of sight As the participant engaged in the task, they observed the TV, with the target pattern situated adjacent to it.

RSE Equipment: Real blocks, TV, lipstick camera, cloth-draped enclosure, small lights.

In a purely virtual environment (PVE), participants interacted with a four-foot high table, which was positioned at the center of the reconstruction volume They utilized Fakespace Pinchgloves, equipped with Polhemus Fastrak magnetic trackers, while wearing a Virtual Research V8 head-mounted display (HMD) for an immersive experience.

Figure 21 – Purely Virtual Environment (PVE) setup The user wore tracked pinchgloves and manipulated virtual objects.

Figure 22 – PVE participant's view of the block manipulation task.

The gloves functioned as a control mechanism, allowing participants to interact with virtual objects seamlessly By pinching their thumb and forefinger together, users signaled the system to grasp a virtual item, which was visually represented by a grasping hand avatar Releasing the pinch instructed the system to drop the object, resulting in the display of an open hand avatar.

Participants interacted with virtual blocks using a generic avatar, where the block nearest to the avatar's hand was highlighted This visual cue indicated to participants that pinching would result in grasping the highlighted block.

When a participant pinches a highlighted virtual block, it seamlessly attaches to their virtual avatar's hand, creating the illusion of holding the block To rotate the block, the participant simply rotates their hand while keeping the pinching gesture intact.

When participants released a block within six inches of the workspace surface, it snapped into an unoccupied position on a three by three grid, streamlining the interaction and reducing task completion time If released slightly farther away, the block simply dropped onto the table, while a release from more than six inches above caused the block to float in mid-air, allowing for quicker rotation of virtual blocks Notably, there was no collision detection between blocks, and participants generally managed any interpenetration that occurred.

The target pattern appeared as a card in the environment, and the PVE program operated at a minimum of twenty frames per second A screenshot depicting the images viewed by the participant is shown in Figure 22.

The PVE equipment utilized includes Fakespace Pinchgloves, Polhemus Fastrak trackers, and the Virtual Research V8 HMD The virtual environments were rendered using the SGI Reality Monster graphics supercomputer located in Sitterson Hall at the University of North Carolina at Chapel Hill, operating on a single rendering pipe with four raster managers.

In a hybrid environment (HE), participants engaged at the same table as in the physical virtual environment (PVE), utilizing V8 head-mounted displays (HMD) and yellow dishwashing gloves Unlike traditional setups, no special equipment or trackers were required, as the reconstruction system provided real-time virtual representations of both the user and the blocks The physical and virtual configurations closely mirrored those of the PVE, ensuring a seamless integration of both environments.

Figure 23 – Hybrid Environment (HE) setup

Participant manipulated real objects while wearing dishwashing gloves to provide a generic avatar.

Figure 24 – HE participant's view of the block manipulation task.

Real blocks, similar to those in the real-world scenario, were utilized as both the participant and the blocks were integrated into the virtual environment (VE) through a real-object reconstruction system The head-mounted display (HMD) showcased a reconstruction of the participant within the VE, enhanced with textures derived from images captured by a camera mounted on the HMD.

Participants observed a generic-looking avatar with precise shape details while wearing identical dishwashing gloves The HE and VFHE operated at a minimum of twenty frames per second, while the reconstruction algorithm functioned at a minimum of twelve frames per second A screenshot illustrating the images viewed by the participants is presented in Figure 24.

HE Equipment: Real blocks, HMD with mounted lipstick camera (with a similar line-of-sight as the user), three wall-mounted cameras for reconstruction system, dishwashing gloves.

Visually-Faithful Hybrid Environment (VFHE) The visually-faithful-hybrid environment (VFHE), as shown in Figure 25, was similar to the HE except the participants did not wear gloves

Figure 25 – Visually Faithful Hybrid Environment

(VFHE) setup Participants manipulated real objects and were presented with a visually faithful avatar.

Figure 26 – VFHE participant's view of the block manipulation task.

The avatar accurately represented the participant, utilizing texture mapping from images captured by a head-mounted display camera This allowed the participant to see a realistic depiction of their own hands, complete with imperfections Figure 26 illustrates a screenshot of the visuals experienced by the participant.

VFHE Equipment: Real blocks, HMD with mounted lipstick camera, three wall-mounted cameras for reconstruction system.

The virtual environment (VE) in all three conditions—PVE, HE, and VFHE—featured a table within a generic room, accompanied by a real Styrofoam table to enhance spatial recognition Utilizing a radiosity-as-textures global illumination solution, the room was visually enriched with various virtual objects, including a lamp, a plant, and a Mona Lisa painting Additionally, the enclosure in the RSE was rendered transparently in the VE, allowing participants to view its contents without obstruction.

Figure 27 – Virtual environment for all three (PVE, HE, VFHE) conditions.

Measures

Audio, video, and written notes were recorded for all participants Anonymity was maintained for all participants through the use of anonymous IDs throughout the experiment and data analysis.

In our study on task performance measures, we assessed the time taken by each participant to accurately replicate a target pattern We also noted instances where participants mistakenly believed they had successfully completed the task When this occurred, the experimenter notified them of the error, allowing them to continue working on the problem Ultimately, every participant was able to correctly complete all patterns.

Sense-of-presence Measures For sense-of-presence, participants answered the Steed-Usoh-Slater Presence

Questionnaire (SUS) [Usoh00] after completing the task in the VE condition

Other Factors To observe the correlation of spatial ability with task performance, we administered the

The Guilford-Zimmerman Aptitude Survey, specifically Part 5 on Spatial Orientation, utilized the Kennedy-Lane Simulator Sickness Questionnaire to evaluate discomfort levels and simulator sickness By administering this questionnaire both before and after the experience, we were able to determine the impact of various environments on simulator sickness.

Participant Reactions At the end of the session, we interviewed each participant on their impressions of their experience Finally, we recorded self-reported and experimenter-reported behaviors.

The study involved forty participants, with thirteen assigned to the purely virtual environment (PVE), thirteen to the hybrid environment (HE), and fourteen to the visually-faithful hybrid environment (VFHE) Two participants were unable to complete the experiment due to equipment issues, one due to data recording errors, and another due to nausea experienced in the PVE Consequently, their data was excluded from the final results.

The study involved a total of forty-one participants, consisting of thirty-three male and eight female undergraduate students from the University of North Carolina (UNC) The group included thirty-one undergraduates, three master's students, three PhD candidates, one staff member, and two spouses of graduate students Participants were recruited through brief presentations in UNC undergraduate Computer Science classes and by word of mouth.

The participants had limited immersive virtual reality experience, with fewer than two prior sessions Their average prior VR experience was reported as a mean of 1.37, with a standard deviation of 0.66, on a scale ranging from 1 (Never before) to 7 (A great deal).

Most participants reported they use a computer a great deal They reported their computer usage as M 6.39 (s.d = 1.14, min = 3, max = 7) on a scale from 1 (Not at all) to 7 (Very much so).

Over the past three years, participants reported their weekly engagement with computer and video games, with most playing between one to five hours The average playtime was 2.85 hours per week, with a standard deviation of 1.26 hours, indicating a range from a minimum of one hour to a maximum of five hours The scale used for measurement categorized responses from "Never" to "More than 10 hours per week."

There were no significant differences between the groups in previous VR experience, computer usage, or video game play.

During the recruiting process, we listed the following restricting factors.

• Participants must be ambulatory (able to walk without assistance) and have use of both hands.

• Participants must have 20/20 vision in both eyes or as corrected

• Participants cannot have a history of epilepsy, seizures, or strong susceptibility to motion sickness

• Participants must be able to comfortably communicate in spoken and written English.

• Participants cannot have significant previous experience (more than two sessions) with virtual reality systems.

• Participants must have taken or be currently enrolled in a higher-level mathematics course (Math31, Calculus of Functions of One Variable, or equivalent).

This second set of criteria was verified at the beginning of the session

Participants must be in their normal state of good fitness during the experiment, meaning those who have consumed sedatives, tranquilizers, decongestants, antihistamines, alcohol, or any significant medications within 24 hours prior to the session will be excluded.

• Participants must be comfortable with the HMD display, and must easily fit the HMD on their heads.

Experiment Procedure

The study was carried out over three days, with participants assigned to specific conditions for ease of setup and equipment use Unbeknownst to the participants, they were randomly assigned to different study conditions, ensuring unbiased results.

Each participant went through a one-hour session that involved three stages:

• Pre-experience – forms and questionnaires were filled out

• The experience – o First a block design task in the RSE o Then in one of PVE, HE, or VFHE

• Post-experience – debriefing, and more questionnaires were filled out

Upon arriving, all participants read and signed a consent form (Appendix A.1).

All participants underwent a final screening, completing a questionnaire to assess their physical and mental condition, ensuring they were fit for task performance Notably, three participants in the HE condition reported consuming over three alcoholic drinks in the past 24 hours However, their inclusion did not alter the statistical significance of the overall results, leading us to retain their data in the analysis.

The Kennedy-Lane Simulator Sickness Questionnaire was administered before and after the virtual experience to evaluate its impact on participants' physical well-being.

Participants completed the Guilford-Zimmerman Aptitude Survey Part 5: Spatial Orientation, which involved reading instructions and practicing for five minutes before answering multiple-choice questions for ten minutes This spatial ability test allowed us to analyze the relationship between spatial skills and task performance metrics.

In the Real Space environment, participants engaged with a setup featuring nine identical painted wooden blocks They were instructed to manipulate these blocks while observing their actions on a TV screen The goal was to replicate a series of target patterns displayed on the blocks, with the time taken for each successful completion being recorded Participants were encouraged to familiarize themselves with the blocks before starting the task.

Participants first familiarized themselves with the procedure by completing six practice patterns—three small (2x2) and three large (3x3) They were informed about the number of blocks in each pattern and instructed to signal the experimenter once they believed they had accurately reproduced the pattern After finishing the practice patterns, participants proceeded to the first test pattern It's important to note that while each participant experiences a unique order of patterns, all participants engage with the same set of twenty patterns, which includes six real space practice patterns, six real space timed patterns, and four each of virtual environment (VE) practice and VE timed patterns.

We measured the time taken by participants to accurately complete each test pattern In cases where participants misjudged their completion, we recorded this as an error and instructed them to correct it while the timer continued The total time taken to complete each pattern served as the performance metric for that task.

Participants were instructed to randomize the positions and orientations of blocks during the task This process continued until they successfully completed all six timed test patterns, which included three small patterns and three large patterns.

Virtual Space The experimenter helped the participant put on the HMD and any additional equipment particular to the particular VE condition (PVE – tracked pinch gloves, HE – dishwashing gloves)

After a period of adjustment to the virtual environment (VE), the participant engaged in practicing a task using two small and two large patterns Subsequently, the participant's performance was evaluated by timing their execution on the same two small and two large test patterns.

Participants were told they could ask questions and take breaks between patterns if they desired Only one person (a PVE participant) asked for a break.

After completing the task, participants were interviewed to gather their impressions and reactions during a semi-structured debriefing session The specific questions served as starting points, allowing the interviewer to explore responses in greater depth Axial coding was employed in the analysis of these post-experience interviews to identify trends and correlate responses, providing insight into participants' subjective evaluations It's important to note that not all participants responded to every question, with many focusing on suggestions for improving the environment, regardless of the questions asked.

Participants completed the simulator sickness questionnaire a second time, allowing us to evaluate any changes in their simulator sickness levels during the task by comparing pre- and post-experience scores (Appendix A.7).

Finally, a modified Slater – Usoh – Steed Virtual Presence Questionnaire (Appendix A.8) was given to measure the participants' level of presence in the VE.

Managing Anomalies If the head tracker lost tracking or crashed, we quickly restarted the system

Participants were highly focused on their tasks, often unaware of any issues that arose during the process Despite experiencing long tracking failures—particularly among taller individuals who faced challenges with our aging HiBall tracker—these issues did not significantly impact their overall task performance time.

A set of alternative patterns was available for replacing voided trials, such as instances where a participant accidentally dropped a block on the floor This method was utilized twice, and the substitutions were duly recorded.

The study employs statistical analysis to examine the impact of various virtual environment (VE) conditions—purely virtual, hybrid, and visually-faithful hybrid—on participant task performance The key dependent variable for assessing performance differences is the time taken to accurately replicate a target pattern within each VE condition compared to the real-world standard environment (RSE) Additionally, the research evaluates participants' sense of presence through the Steed-Usoh-Slater Presence Questionnaire, focusing on the sense-of-presence scores as another dependent variable to identify variations across the different VE conditions.

A two-tailed t-test is employed to assess whether the differences in observed values between groups arise from chance or reflect a true difference in conditions The t-test and its associated p-value quantify this likelihood, with results deemed significant if p-values are below 0.05, known as the α value This threshold indicates a 5% risk of committing a Type 1 error, which occurs when a result is incorrectly identified as significant Typically, we adopt an α=0.05 level for significance, suggesting a 95% probability that the observed mean differences are due to actual variations in the conditions rather than random chance.

Hypotheses

Task Performance: Participants who manipulate real objects will complete a spatial cognitive manual task in less time than will participants who manipulate corresponding virtual objects.

Sense-of-presence: Participants represented in the VE by a visually faithful self-avatar will report a higher sense-of-presence than will participants represented by a generic self-avatar.

Our initial hypothesis suggested that the performance gap in tasks between the hybrid environment (HE) and visually-faithful hybrid environment (VFHE) compared to the real space environment (RSE) would be less significant than the gap observed between the purely virtual environment (PVE) and RSE This implies that engaging with real objects enhances task performance.

Our second hypothesis posited that participants would report a greater sense of presence in the virtual first-hand experience (VFHE) compared to both the physical virtual environment (PVE) and the head-mounted experience (HE) This suggests that higher avatar visual fidelity enhances the sense of presence.

• Further, we expected no significant difference in task performance for participants in the VFHE and HE conditions, i.e interacting with real objects improved task performance regardless of avatar visual fidelity.

We anticipated that there would be no notable difference in the sense of presence between participants in the HE and PVE conditions, suggesting that generic hand avatars would produce comparable effects on presence, irrespective of the presence of real objects.

Results

Task Performance The task performance results are in Appendix B.1

Small Pattern Time (seconds) Large Pattern Time (seconds)

Mean S.D Min Max Mean S.D Min Max

Table 3 – Difference in Task Performance between VE condition and RSE

Small Pattern Time (seconds) Large Pattern Time (seconds)

Visually Faithful Hybrid – Real Space 13.14 8.09 35.20 18.03

Figure 28 – Difference between VE and RSE performance for Small Patterns The lines represent the mean difference in time for each VE condition.

Figure 29 – Difference between VE and RSE performance for Large Patterns The lines represent the mean difference in time for each VE condition.

Table 4 – Between Groups Task Performance Comparison

Small Pattern Large Pattern t – test with unequal variance p – value t – test with unequal variance p - value

In both small and large patterns, the performance of VFHE and HE tasks was significantly superior to that of the PVE task However, there was no significant difference in task performance between HE and VFHE at the α=0.05 level.

Participants in a purely virtual environment experienced significantly longer completion times for the block-pattern task compared to a real space, taking 2.84 times longer for small patterns and 3.23 times longer for large patterns.

Participants in the hybrid environment (HE) and visually-faithful hybrid environment (VFHE) experienced smaller performance differences compared to the real space environment Specifically, HE participants took 2.16 to 2.55 times longer to complete tasks, while VFHE participants took only 1.92 to 2.04 times longer, as detailed in Table 5.

Table 5 – Relative Task Performance Between VE and RSE

The SUS Presence Questionnaire concluded with a final question assessing participants' perceived task achievement on a scale from 1 (not very well) to 7 (very well) The results indicated a significant difference between the VFHE group, which scored an average of 5.43, and the PVE group, which had an average score of 4.57, with a t-value of 2.23 and a p-value of 0.0345 at the α=0.05 level.

Table 6 – Participants' Response to How Well They Thought They Achieved the Task

How well do you think you achieved the task? (1 7)

Sense-of-presence The complete sense-of-presence results are in Appendix B.B.

We augmented the standard Steed-Usoh-Slater Presence Questionnaire with two questions that focused on the participants’ perception of their avatars The entire questionnaire is included as Appendix A.6.

• How much did you associate with the visual representation of yourself (your avatar)? During the experience, I associated with my avatar (1 not very much, 7 very much)

• How realistic (visually, kinesthetically, interactivity) was the visual representation of yourself (your avatar)? During the experience, I thought the avatar was (1 not very realistic, 7 very realistic)

Table 7 – Steed-Usoh-Slater Sense-of-presence Scores for VEs

Total Sense-of-presence Score Scale from 0 6

Figure 30 – Raw Steed-Usoh-Slater Sense-of-presence Scores The horizontal lines indicate means for the VE conditions Note the large spread of responses.

Table 8 – Steed-Usoh-Slater Avatar Questions Scores

Table 9 – Comparing Total Sense-of-presence Between Conditions

Between Groups Total Sense-of-presence t – test with unequal variance p – value

Other Factors Simulator sickness was not significantly different between the groups at the α = 0.05 level

The complete results are included as Appendix B.5.

The analysis of spatial ability revealed no significant differences between the groups, as indicated in Table 10, with detailed test results available in Appendix B.6 This suggests that the participants' foundational spatial skills did not introduce bias among the groups Additionally, a moderate correlation was found between spatial ability and performance, with correlation coefficients of r = -0.31 for small patterns and r = -0.38 for large patterns.

Table 10 – Simulator Sickness and Spatial Ability Between Groups

Spatial Ability t – test with unequal variance p – value t – test with unequal variance p – value

Discussion

Task Performance Hypothesis: Participants who manipulate real objects will complete a cognitive manual task in less time than will participants who manipulate corresponding virtual objects.

In our investigation, we found that engaging with real objects led to a significant enhancement in performance for cognitive manual tasks compared to virtual objects While task performance across all virtual environment conditions was notably inferior to that in real-world settings, participants in hybrid and visually-faithful hybrid scenarios demonstrated significantly better performance than those in purely virtual environments.

There is a slight difference between HE and VFHE performance (Table 4, p=0.055), and we do not have a hypothesis as to the cause of this result This is a candidate for further investigation.

These results showing significantly poorer task performance when interacting with purely virtual objects leads us to believe that the same hindrances would affect practice, training, and learning the task.

Handling real objects makes task performance and interaction in the VE more like the actual task.

Sense-of-presence Hypothesis: Participants represented in the VE by a visually faithful self-avatar will report a higher sense-of-presence than will participants represented by a generic self-avatar.

Interviews revealed a preference for visually faithful avatars (VFHE condition), yet the reported sense of presence did not show a statistically significant difference when compared to those using generic avatars (HE and PVE).

The analysis revealed no statistically significant differences at the α=0.05 level across all eight sense-of-presence questions This lack of variation was consistent both in the individual question responses and in the overall sense-of-presence score.

A study by Slater warns against using the SUS Questionnaire for comparing presence in different virtual environments, noting that no existing questionnaire effectively supports such comparisons While the SUS Questionnaire did not reveal a presence effect, this does not imply that one was absent.

Post-experience interviews revealed that participants in the purely virtual condition appreciated the responsiveness of their avatars, noting, "the avatar moved when I did," and rated avatar-related questions highly Similarly, those in the visually faithful avatar condition remarked, "Yeah, I saw myself," also giving high marks for avatar realism Consequently, both groups exhibited comparable scores regarding the realism of their avatars.

In retrospect, the various elements of self-avatars—such as appearance, movement, and interactivity—might have been better analyzed through distinct questions Participants consistently showed a preference for movement over appearance, suggesting that kinematic fidelity is more crucial than visual fidelity for enhancing the sense of presence Future research could explore the effects of visual fidelity independent of dynamic fidelity on sense of presence; however, we anticipate that this exploration may yield limited results, as we believe the influence of visual fidelity is relatively weak.

Debriefing Trends We list here the major trends and discuss all trends in more detail later

Participants in PVE and HE focused on the accuracy of motion in their virtual body representations, while those in VFHE emphasized the importance of appearance fidelity This observation suggests that the perception of appearance fidelity may inherently encompass the fidelity of motion.

• Participants in all groups responded that they were almost completely immersed when performing the task.

Participants across all groups reported that the presence of virtual objects in the environment, including a painting, plant, and lamp, enhanced their sense of immersion, despite not interacting directly with these items.

• Participants in all groups responded that seeing an avatar improved their sense-of-presence.

• 7 out of 27 VFHE and HE participants mentioned that tactile feedback of working with real objects improved their sense-of-presence.

The following interview trends consistent with results of previous research or our experiences with VEs:

• Being involved in a task heightened sense-of-presence.

• Interacting with real objects heightened sense-of-presence [Insko01].

• System latency decreased sense-of-presence [Meehan01].

Debriefing Results – Major Trends A better picture of the effect of the visually faithful avatars and interacting with real objects can be drawn from the debriefing responses of the participants.

Participants in the PVE and HE conditions noted that the motion fidelity of generic avatars enhanced their sense of presence All feedback regarding avatar realism from these conditions highlighted its significant impact on their immersive experience.

HE conditions related to motion accuracy.

• “Once I got used to where the hands were positioned… it felt like they were my hands.”

• “It was pretty normal, it moved the way my hand moved Everything I did with my hands, it followed.”

• “They followed my motions exactly, I thought”

• “I thought they behaved pretty well I didn't feel like I was looking at them, though I felt I was using them more like a pointer, than the way I would look at my own hands.”

The immersive experience of the virtual room was truly enhanced by the synchronized movement of my hands; as I moved my hands, the environment dynamically adjusted to reflect those actions.

• "Being able to see my hands moving around helped with the sense of ‘being there’."

Many VFHE participants highlighted the importance of avatar visual fidelity in enhancing their sense of presence, with all feedback regarding avatar realism focusing specifically on visual accuracy.

• “Nice to have skin tones, yes (I did identify with them)”

• "Yeah, those were my hands, and that was cool I was impressed that I could see my own hands"

• "My hands looked very realistic… Yeah, they looked very real."

• “Appearance looked normal, looked like my own hands, as far as size and focus looked absolutely normal… I could see my own hands, my fingers, the hair on my hands”

Participants in the interviews indicated that when they encountered a visually faithful avatar, they expected the movement to be equally accurate This leads us to hypothesize that for users of virtual environments, visual fidelity takes precedence over kinetic fidelity.

Participants overwhelmingly felt fully immersed in the presented virtual environment during the task Key factors contributing to this sense of presence included effective head tracking and the visibility of various objects within the virtual space.

Perhaps two quotes from the participants sum up the reconstructed avatars best:

• “I thought that was really good, I didn't even realize so much that I was virtual I didn't focus on it quite as much as the blocks “

• “I forget… just the same as in reality Yeah, I didn't even notice my hands.”

In a study involving HE and VFHE participants, 75% identified reconstruction errors while 25% observed reconstruction lag Many participants expressed concerns about the restricted field of view in their working environment Notably, despite having a similarly limited working volume and field of view, the RSE did not elicit any complaints from participants.

• 65% of the VFHE and 30% of the HE participants noted their avatar looked real

• 93% of the PVE and 13% of the HE and VFHE participants complained that the interaction with the blocks was unnatural.

• 25% of the HE and VFHE participants felt the interaction was natural

When asked what increased their sense-of-presence in the VE:

• 26% of the HE and VFHE participants said that having the real objects and tactile feedback increased their sense-of-presence

When asked what decreased their sense-of-presence in the VE:

• 43% of PVE participants commented that the blocks not being there or behaving as expected reduced their sense-of-presence.

• 11% of HE and VFHE participants also mentioned that manipulating real objects decreased their sense-of-presence because “they reminded them of the real world.”

Participants in the study reported that those using Virtual Feedback and Haptic Environment (VFHE) felt comfortable interacting with the virtual environment significantly faster than those using the Passive Virtual Environment (PVE) Specifically, VFHE participants required nearly one and a half fewer practice patterns—averaging 1.50 compared to 2.36 patterns—before achieving comfort, with results showing a statistically significant difference (T26 = 2.83, p=0.0044) at the α=0.01 level.

• Two-handed interaction greatly improved performance over one-handed interaction.

Conclusions

Our study assessed the impact of engaging with real objects and visually accurate avatars on task performance and presence within a spatial cognitive task virtual environment The findings indicate that both interaction types significantly influence participants' performance and their sense of presence in the virtual space.

Interacting with real objects enhances task performance in spatial cognitive tasks, aligning results more closely with real-world scenarios Participants using real objects in virtual environments (VEs) exhibit behaviors similar to those in actual settings Our study revealed that manipulating virtual objects can lead to mistraining and the development of VE-specific strategies, even in straightforward tasks.

Training and simulation virtual environments (VEs) aim to replicate real-life experiences, significantly benefiting from participants interacting with real objects The motion constraints and tactile feedback from these objects enhance the realism of the experience, making it more akin to actual tasks compared to purely virtual interactions Even in the absence of a full real-object reconstruction system, incorporating the tracking, modeling, and instrumentation of real objects can greatly improve cognitive task performance.

Motion fidelity plays a crucial role in enhancing the believability of self-avatars, surpassing the importance of visual fidelity Our hypothesis suggests that motion fidelity is the key factor contributing to self-avatar believability While a visually accurate avatar may be preferable to a generic one, its impact on the sense of presence appears to be limited.

Designers should prioritize tracking before rendering user avatar models for immersive virtual environments (VEs) In the absence of a real-object reconstruction system, applying texture mapping to the self-avatar model using captured images of the user can significantly enhance motion and visual fidelity, leading to a greater sense of immersion.

NASA Case Study

NASA Collaboration

We have initiated a collaboration with the NASA Langley Research Center (NASA LaRC) to explore how our system can aid in assessing payload designs and assembly layouts Recognizing that space planning errors can lead to significant financial, scheduling, and personnel challenges, we have engaged with NASA experts across various engineering, science, and technical fields Together, we have pinpointed critical tasks that could benefit from hybrid virtual environments (VEs) Insights from NASA LaRC, including motivations, comments, and suggestions, have been gathered through surveys, interviews, and informal discussions during our experiments.

Driving Problems NASA LaRC payload designers are interested in examining models of payloads and payload subsystems for two major tasks, assembly verification and assembly training

NASA LaRC payload designers aim to identify potential assembly, integration, and testing challenges early in the project development cycle Currently, various subsystems are subcontracted independently, leading to compatibility and layout issues during integration Despite meticulous specifications for subsystem design, integration problems frequently arise.

Evaluating the interaction between various subpayloads poses challenges due to the intricate nuances known primarily to their respective development teams A frequent integration task, such as attaching external cables, has highlighted issues faced by NASA LaRC designers, who reported encountering spacing problems during this final step Despite adherence to specifications, the actual process revealed insufficient space for hands, tools, or components, leading to delays in schedules, redesigns of equipment, and the need for improvised engineering solutions.

Simplified physical mock-ups are used for design verification and layout, with assembly procedures outlined in detailed step-by-step instructions However, NASA LaRC payload designers have noted that the low fidelity of these mock-ups and the assembly documentation has led to significant issues arising in later project stages.

NASA LaRC payload designers aim to enhance technician training in assembly and maintenance procedures using virtual payload models This approach allows for repetitive practice, enabling more individuals to gain proficiency in crucial assembly stages, particularly before physical mock-ups or actual devices are available Early training is essential, especially for tasks that demand highly specialized skills, such as the precise release of delicate paraffin latches.

LaRC designers obtain CAD models of payload subsections from subcontractors during the early design phase, allowing them to explore assembly, layout, and integration before construction begins Addressing changes in the initial project stages is significantly more cost-effective and efficient compared to later fixes Given the tight timelines for payload development, the ability to rapidly test multiple design alternatives is crucial Utilizing a virtual environment could provide the necessary capability to achieve this.

A hybrid Virtual Environment (VE) system allows designers to test configurations alongside assembly personnel, utilizing real tools and parts This approach is hypothesized to be more effective for evaluating hardware designs and planning assembly compared to a purely virtual system.

Case Study: Payload Spacing Experiment

To assess the effectiveness of hybrid virtual environments (VEs) for NASA Langley Research Center (LaRC) assembly tasks, we developed an abstracted payload layout and assembly task for four LaRC payload designers The designers received task information similar to their actual design evaluation process, allowing for a realistic discussion of approaches before executing the assembly procedure in the hybrid VE Following the task execution, we conducted interviews to collect the designers' insights on the system's potential usefulness for their current payload assembly, testing, and integration tasks.

To gain insight into the complexities of payload assembly, we visited NASA's Langley Research Center, where engineers and technicians provided an overview of the various stages involved in payload development During our visit, we learned about the challenges they frequently encounter, particularly in relation to the CALIPSO weather imaging satellite Additionally, we examined a light imager unit known as the photon multiplier tube (PMT), which was presented in an engineering mock-up, excluding the central imager tube.

Figure 34 – Photon Multiplier Tube (PMT) box for the CALIPSO satellite payload We used this payload subsystem as the basis for our case study.

Courtesy of NASA LaRC's CALIPSO project.

Figure 35 – VRML model of the PMT box.

We received CAD models of the PMT, and abstracted a task that was similar to many of the common assembly steps, such as attaching components and fastening cable connectors.

The PMT model, along with payloads A and B, was integrated into the virtual environment (VE) for assembly tasks The system effectively conducted collision detection between the virtual payloads and real-object avatars, visually indicating any collisions by rendering the affected virtual object in red, as illustrated in Figure 36.

Figure 36 – Collisions between real objects (pipe and hand) and virtual objects (payload models) cause the virtual objects to flash red.

Figure 37 – Parts used in the shield fitting experiment PVC pipe prop, power cord, tongs (tool), and the outlet and pipe connector that was registered with the virtual model.

The objective was to engage with real objects, specifically by screwing a cylindrical shield, represented by a PVC pipe, into a pipe receptacle and connecting a power connector to an outlet within the shield If participants needed extra help, tools were made available to facilitate the task A diagram illustrating the process is also provided.

The designers were to determine how much space was required between the top of the PMT box and the bottom of payload A as shown in Figure 38.

The goal of the task was to assess the necessary space between the PMT and the payload above it, indicated by the red arrow, to effectively carry out the shield and cable fitting task.

The cross-section diagram illustrates the task, highlighting that the red pipe and blue power cable must be connected to their respective connectors located at the center shaft of the virtual PMT box.

Figure 40 – The first step was to slide the pipe between the payloads and then screw it into the fixture Figure 41 – 3rd person view of this step.

Figure 42 – After the pipe was in place, the next step was to fish the power cable down the pipe and plug it into the outlet on the table.

Figure 43 – 3rd person view of this step Notice how the participants holds his hand very horizontally to avoid colliding with the virtual PMT box.

Inserting the cable into the outlet proved challenging without the appropriate tool, prompting the use of tongs to facilitate the process Figure 45 offers a third-person perspective of this step.

On March 12, 2002, four NASA LaRC payload designers and engineers conducted a task experiment utilizing our reconstruction system Prior to the experiment, participants completed a survey (see Appendix C.1) that provided essential details regarding the size and orientation of various components and connectors This included specific information about the pipe, which measured 14 cm in length and 4 cm in diameter, that needed to be attached to the receptor at the bottom of the PMT box cylinder.

• How much space between the PMT and payload A is necessary to perform the pipe insertion and power cable attachment procedures?

• How much space between the PMT and payload A would you actually allocate (given typical payload layout space constraints) for the pipe insertion and power cable attachment procedures?

After completing the survey, each participant performed the pipe insertion and power cable attachment procedure in the reconstruction system

• First, participants donned the HMD and walked around the VE to get used to it.

They evaluated the collision detection system by positioning their hands near the PMT box and payload A, observing the visual feedback of collisions with virtual objects, which was indicated by red rendering.

• Then, they picked up the pipe and eased it into the center cylindrical assembly while trying to avoid colliding with either payload A or the PMT box.

• After the pipe was lowered into the cylindrical shaft of the PMT, they snaked the power cord down the tube and inserted it into the outlet

The experimenter was able to dynamically modify the distance between the PMT and payload A based on the participant's requests for more or less space By adjusting the height of payload A, the experimenter facilitated the rapid evaluation of various spatial configurations between the two payload subassemblies.

The post-experience survey assessed participants' feedback regarding the necessary space between the PMT and payload A, as determined during their time in the virtual environment (VE) A summary of the survey results can be found in Table 11, with detailed responses from all participants included in Appendix C.3, while the survey itself is available in Appendix C.2.

Results Given that the pipe was 14 cm and had a 4 cm diameter:

Table 11 – LaRC participant responses and task results

(Pre-experience) How much space is necessary between Payload A and the PMT?

(Pre-experience) How much space would you actually allocate? 21 cm 16 cm 20 cm 15 cm

Actual space required (determined in VE) 15 cm 22.5 cm 22.3 cm 23 cm

(Post-experience) How much space would you actually allocate after your VE experience? 18 cm 16 cm

Space is limited, and engineers were particularly conservative with its use This was notably evident for participants #2 and #4, who had hands-on experience with payload assembly In contrast, participant #3, a flight software engineer, had less practical experience in installing payload hardware.

All participants successfully completed the task, but the approach varied Participant #1 managed to insert the stiff power cable into the outlet without additional tools However, to emphasize the potential need for unforeseen tools during assembly or repair, we provided a more flexible cable for participants #2 and #3.

Four participants reported difficulties in completing the task, expressing a need for a tool to assist with cable insertion After being provided with a set of tongs, they successfully completed the power cable insertion task, as illustrated in Figure 45 The tool facilitated the adjustment of spacing between the PMT and Payload A, preventing potential collisions This interactive adjustment allowed for the testing of new spacing design layouts within the virtual environment, resulting in an increased average spacing from 14 cm to 24 cm.

The flexible power cable struggled to navigate through the narrow pipe and connect to the outlet without assistance, as the participant's hands could not fit inside the pipe Even when connecting the power cable prior to inserting the pipe, spacing issues persisted Additionally, the virtual PMT box obstructed the power cable attachment, regardless of the pipe's insertion status.

Conclusions

Recap results

Our innovative real-time object reconstruction algorithm utilizes graphics hardware to create a view-dependent sampling of real objects' visual hulls from various camera angles This technology enables the generation of visually accurate avatars and dynamic objects within immersive virtual environments Notably, the system operates without the need for extra trackers or prior object data, facilitating seamless interaction between these objects and the surrounding virtual space.

We have enhanced our real-time image-based object reconstruction system to enable collision detection between real and virtual objects, implementing responsive interactions This advancement necessitated the development of innovative algorithms designed to manage the collisions between polygonal virtual models and dynamic real-object avatars.

User Studies on Real Objects in VEs We then conducted studies to evaluate the advantages that manipulating real objects could provide over purely virtual objects in cognitive manual tasks

Research indicates that engaging with real objects in a virtual environment (VE) significantly enhances task performance compared to virtual objects This improvement is attributed to the comprehensive interaction affordances of real objects and the presence of haptic feedback for participants.

For developers of virtual environments (VEs) that focus on simulation, user training, or spatial cognition, incorporating interaction with real objects is highly recommended This approach enhances participants' performance, making it more closely align with their real-world task execution.

The results did not show a significant difference in participant reported sense-of-presence for those represented by a visually faithful personalized self-avatar over those represented by generic self-avatars

Our research indicates that the key factor for avatars to convey presence is kinetic fidelity, while visual fidelity, though significant, appears to be less critical We propose that participants associate visual fidelity with kinetic fidelity; when they encounter a highly realistic avatar, they anticipate it to exhibit realistic movements as well.

Further research and an enhanced presence questionnaire could clarify the impact of visually faithful avatars Participants who were represented by these avatars demonstrated a preference for personalization However, our study did not address the extent of this preference or the potential benefits it may provide.

Our research indicates that assembly verification tasks can be significantly enhanced by interacting with real objects in a virtual environment (VE) Collaborating with NASA LaRC, we have demonstrated that our reconstruction system offers considerable advantages in hardware layout and assembly verification This system facilitates intricate interactions with virtual models using actual tools and components, allowing personnel to dedicate more time to training on delicate operations Additionally, designers can effectively assess payload development challenges related to assembly verification, layout, and integration.

Future Work

The current prototype facilitates fundamental research on virtual environment (VE) interaction issues Future efforts will concentrate on enhancing performance, refining outcomes, and exploring additional research avenues in VE interaction.

Reconstruction Algorithm Future work to the reconstruction algorithm includes porting the system to a networked cluster of PCs, speed and calibration improvements, and correlating visual hulls to real objects.

The current reliance on the SGI Reality Monster graphics supercomputer is hindered by its high cost, limited availability, and infrequent upgrades, rendering it unsuitable for widespread use To achieve mass adoption, cost-effectiveness is crucial The competitive forces in the commodity hardware market have resulted in lower prices and more frequent upgrades, making networked PC-based systems increasingly appealing Significant advancements in consumer-grade computers, networking, and graphics cards are paving the way for innovative solutions Furthermore, emerging image-based scene reconstruction algorithms are beginning to leverage networked PCs, as seen in projects like Virtualized Reality, 3-D Tele-Immersion, and Image-Based Visual Hull.

A standard hardware configuration includes a dedicated network of PCs connected through high-bandwidth interfaces like Firewire to high-quality cameras for scene capture Each PC is equipped with sufficient computational power to handle image processing tasks The processed images are then transmitted, potentially using compression, to a central PC responsible for reconstruction.

Modern high-bandwidth PC buses, like AGP 4x and AGP 8x, offer the essential throughput needed for real-time image uploads to graphics card memory The algorithm's graphics demands are relatively low, allowing current systems to deliver interactive performance efficiently.

Porting the algorithm to PCs enables us to leverage the enhanced performance and features of evolving game-graphics hardware The latest generations of graphics cards, which offer improved performance and advanced functions like pixel shaders, would allow the reconstruction algorithm to deliver faster results with higher resolution.

To enhance the efficiency of the reconstruction algorithm, we focus on minimizing the fill rate, which refers to the number of pixels processed during scan conversion One effective optimization strategy involves calculating bounding boxes for clusters of object pixels within the object-pixel maps This approach facilitates quick accept/reject decisions during the plane sweeping process By addressing the fill rate, which currently serves as the primary bottleneck in reconstruction, we can achieve significantly faster results.

To enhance system accuracy, future efforts should focus on refining the input for reconstruction algorithms, as image segmentation results are particularly sensitive to shadows and high-frequency areas Current methods, such as segmentation thresholds and background images, often yield imprecise results, leading to increased visual hull sizes for misclassified object pixels and gaps for misclassified background pixels Implementing advanced image processing techniques on camera inputs and adopting a more precise camera calibration algorithm can significantly improve accuracy and reduce noise in reconstructions.

Identifying the visual hull of a real object enhances collision response and tracking accuracy By discerning which real object a virtual entity interacts with, the system can tailor its responses accordingly Additionally, tracking real objects allows for the creation of high-fidelity virtual models that can collide with virtual counterparts, providing users with a more immersive experience in place of a standard real object avatar.

Real – Virtual Object Interaction The collision detection and response algorithms provided plausible responses for a basic physics simulation Higher fidelity responses requirements could require new algorithms for finding important data.

Our existing algorithm presents significant potential for performance enhancement We anticipate substantial gains by optimizing the utilization of textures and minimizing OpenGL state changes Implementing these adjustments across the entire buffer, rather than on a per-primitive level, will help distribute the costs of state changes, which are more resource-intensive than triangle setup and rendering.

Enhancing collision detection accuracy can be achieved by optimizing framebuffer resolution and refining camera calibration techniques The effectiveness of collision detection relies on the viewport size used during volume-querying, suggesting that adjusting viewport dimensions based on the rendered primitive can establish a minimum spatial resolution for collision detection Additionally, implementing more precise camera calibration methods will further contribute to improved collision detection accuracy.

Enhancing collision response can be achieved by developing advanced algorithms that accurately determine penetration depth, surface normals, and collision points The existing algorithms provide only estimated results, highlighting the need for improvement Implementing volume-querying techniques with various primitives during collision response may lead to more precise collision resolution information and better overall outcomes.

The current challenge in addressing collisions stems from our inability to backtrack the movements of real objects By retaining previous camera images and tracking these objects within those images, we can facilitate backtracking Analyzing the shape and motion of a tracked object over multiple frames allows us to extract crucial information, including velocity, acceleration, rotation, and center of mass This enhanced data would significantly improve simulations, leading to more precise collision responses.

Expanding interactions between real objects and synthetic environments paves the way for a new hybrid reality Identifying the initial applications of this technology can be challenging, but integrating real objects effectively addresses interaction limitations that have previously hindered the effectiveness of virtual environments (VEs) in various applications.

Future research should explore additional applications beyond assembly verification that could leverage the integration of dynamic real objects into virtual environments (VEs) We propose that various VE applications stand to gain significantly from this incorporation.

• Training – Handling real tools and parts while interacting with virtual objects would allow training VEs to simulate actual conditions more accurately.

• Telepresence – Rendering novel views of real participants and objects within a virtual scene to other participants could improve the interpersonal communication effectiveness.

Incorporating real objects into phobia treatment can significantly enhance the effectiveness of virtual environments (VE) For instance, the use of a tracked furry toy spider in arachnophobia therapy demonstrated remarkable results, as the combination of visual and tactile feedback from the real object interacting with virtual elements improved the overall realism of the experience.

User Study The third and possibly most interesting area of future research is the ability to study avatars and interaction in immersive VEs

Bibliography

[Abdel-Aziz71] Y Abdel-Aziz and H Karara, Direct Linear Transformation from Comparator Coordinates Into Object Space Coordinates in Close-Range Photogrammetry Proceedings of the

Symposium on Close-Range Photogrammetry Falls Church, VA: American Society of

[Arthur00] K Arthur, “Effects of Field of View on Performance with Head-Mounted Displays”. Department of Computer Science, UNC-Chapel Hill, 2001, Unpublished dissertation.

[Baba00] S Baba, H Saito, S Vedula, K.M Cheung, and T Kanade Apperance-Based Virtual-View Generation for Fly Through in a Real Dynamic Scene In VisSym ’00 (Joint Eeurographics – IEEE TCVG Symposium on Visualization), May, 2000.

[Baciu99] G Baciu, W Wong and H Sun, “RECODE: An Image-based Collision Detection Algorithm”, Journal of Visualization and Computer Animation, Vol 10, No 4, 1999 pp 181-192.

[Badler99] N Badler, R Bindiganavale, J Bourne, J Allbeck, J Shi, and M Palmer Real Time Virtual Humans Proceedings of International Conference on Digital Media Futures British Computer Society, Bradford, UK, April, 1999.

[Banerjee99] A Banerjee, P Banerjee, N Ye, and F Dech Assembly Planning Effectiveness using Virtual Reality Presence, 8(7):204 217, 1999.

[Bouguet98] J Bouguet Camera Calibration from Points and Lines in Dual-Space Geometry. Technical Report, California Institute of Technology [XXX]

In their 1997 paper, "An Evaluation of Techniques for Grabbing and Manipulating Remote Objects in Immersive Virtual Environments," D Bowman and L Hodges explore various methods for interacting with virtual objects Presented at the Symposium on Interactive 3-D Graphics, this work, edited by Michael Cohen and David Zeltzer, provides insights into the effectiveness of different manipulation techniques in immersive settings The findings contribute to the understanding of user interaction in virtual environments, making it a significant reference for researchers and developers in the field.

[Boyles00] M Boyles and S Fang “Slicing-Based Volumetric Collision Detection”, ACM Journal of Graphics Tools, 4(4): 23-32, 2000.

[Breen95] D Breen, E Rose, R Whitaker, Interactive Occlusion and Collision of Real and Virtual Objects in Augmented Reality, 1995.

[Brooks99] F Brooks Jr., 1999: "What's Real About Virtual Reality?" IEEE Computer Graphics and

[Bush99] T Bush, (1999) Gender Differences in Cognitive Functioning: A Literature Review The Cyber- Journal of Sport Marketing, Vol 1.

[Butcher00] J Butcher, C Bass, and L Danisch Evaluation of Fiber-Optic Sensing Band For the Characterization of Deformation Contours Southern Biomedical Engineering Conference 2000

[Carr98] J Carr, W Fright, A Gee, R Prager and K Dalton 3-D Shape Reconstruction using Volume Intersection Techniques In IEEE International Conference on Computer Vision Proceedings, 1095-

[Chien86] C Chien and J Aggarwal Volume/Surface Octrees for the Representation of Three- Dimensional Objects Computer Vision, Graphics, and Image Processing, volume 36, No 1, 100-

[Daniilidis00] K Daniilidis, J Mulligan, R McKendall, G Kamberova, D Schmid, R Bajcsy Real- Time 3-D Tele-immersion In The Confluence of Vision and Graphics, A Leonardis et al (Ed.), Kluwer Academic Publishers, 2000.

[Edelsbrunner92] H Edelsbrunner and E Mucke Three-Dimensional Alpha Shapes 1992 Workshop on Volume Visualization pp 75-82, 1992.

[Ehmann00] S Ehmann and M Lin, Accurate Proximity Queries Between Convex Polyhedra By Multi-Level Voronoi Marching Proceedings of the International Conference on Intelligent Robots and Systems, 2000.

[Faugeras93a] O Faugeras, Three Dimensional Computer Vision, The MIT Press, 1993.

[Faugeras93b] O Faugeras, T Vieville, E Theron, J Vuillemin, B Hotz, Z Zhang, L Moll, P Bertin,

H Mathieu, P Fua, G Berry, and C Proy Real-time Correlation-Based Stereo: Algorithm, Implementations and Applications INRIA Technical Report RR-2013.

The study by Garau et al (2001) investigates the influence of eye gaze on communication when using humanoid avatars Presented at the SIG-CHI Conference on Human Factors in Computing Systems, the research highlights the significance of eye contact in enhancing interaction quality The findings suggest that eye gaze plays a crucial role in establishing presence and engagement in virtual environments, thereby impacting user experience.

[Hand97] C Hand A Survey of 3-D Interaction Techniques, Computer Graphics Forum, 16(5), pp 269-

[Heeter92] C Heeter, Being There: The Subjective Experience of Presence, PRESENCE, Teleoperations and Virtual Environments, Volume 1, Number 2, pp 262-271.

[Hilton00] A Hilton, D Beresford, T Gentils, R Smith, W Sun, and J/ Illingworth Whole-Body

Modelling of People from Multiview Images to Populate Virtual Worlds, The Visual Computer, 16

[Hinckley94] K Hinckley, R Pausch, J Goble, and N Kassell, Passive Real-World Interface Props for Neurosurgical Visualizatoin CHI94, pp 452-458.

Research by Hoffman et al (1997) explores the effectiveness of virtual reality and tactile augmentation in treating spider phobia, showcasing its potential in therapeutic settings Additionally, Hoffman (1998) highlights that physically interacting with virtual objects through tactile augmentation significantly enhances the realism of virtual environments, suggesting that these technologies can improve user experience and engagement in virtual reality applications.

Symposium '98, Atlanta GA, p 59-63 IEEE Computer Society, Los Alamitos, California.

[Hoff01] K Hoff, A Zaferakis, M Lin, and D Manocha, Fast and Simple 2-D Geometric Proximity Queries Using Graphics Hardware, 2001 ACM Symposium on Interactive 3-D Graphics pp 145-

[Hollnagel02] E Hollnagel, (2002) Handbook of Cognitive Task Design To be published by Lawrence Erlbaum Associates, Inc.

[Insko01] B Insko, Passive Haptics Significantly Enhances Virtual Environments, Department of Computer Science, UNC-Chapel Hill, 2001, Unpublished dissertation.

[Levoy00] M Levoy, K Pulli, B Curless, S Rusinkiewicz, D Koller, L Pereira, M Ginzton, S. Anderson, J Davis, J Ginsberg, J Shade, and D Fulk “The Digital Michelangelo Project: 3-D Scanning of Large Statues.” Proceedings of ACM SIGGRAPH 2000 pp 131-144, 2000.

[Lindeman99] R Lindeman, J Sibert, and J Hahn “Hand-Held Windows: Towards Effective 2D Interaction in Immersive Virtual Environments.” In IEEE Virtual Reality, 1999.

[Kakadaris98] I Kakadiaris and D Metaxas Three-Dimensional Human Body Model Acquisition from Multiple Views Int’l Journal of Computer Vision 30, 1998.

[Kutulakos00] K Kutulakos Approximate N-View Stereo Proceedings, 6 th European Conference on Computer Vision, Dubin, Ireland, pp 67-83, 2000.

[Laurentini4] A Laurentini The Visual Hull Concept for Silhouette-Based Image Understanding.

IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol 16, No 2, 150-162, February

[Lin98] M Lin, S Gottschalk, Collision Detection between Geometric Models: A Survey Proc IMA

Conference on Mathematics of Surfaces 1998.

[Lok01] B Lok Online Model Reconstruction for Interactive Virtual Environments,” Proceedings

2001 Symposium on Interactive 3-D Graphics, Chapel Hill, N.C., 18-21 March 2001, 69-72, 248. [Maringelli01] F Maringelli, J McCarthy, A Steed, M Slater and C Umiltà Shifting Visuo-Spatial Attention in a Virtual Three-Dimensional Space, Cognitive Brain Research Volume 10, Issue 3, January 2001, 317-322

[Matusik00] W Matusik, C Buehler, R Raskar, S Gortler and L McMillan Image-Based Visual Hulls In SIGGRAPH 00 Proceedings, Annual Conference Series, pages 369-374.

[Matusik01] W Matusik, C Buehler, and L McMillan, "Polyhedral Visual Hulls for Real-TimeRendering", Eurographics Workshop on Rendering 2001.

[Meehan01] M Meehan, Physiological Reaction as an Objective Measure of Presence in Virtual Environments, Department of Computer Science, UNC-Chapel Hill, 2001, Unpublished dissertation.

[Moezzi96] S Moezzi, Katkere, A., Kuramura, D Y., & Jain, R Reality Modeling and Visualization from Multiple Video Sequences IEEE Computer Graphics and Applications, 16(6), 58-63, 1996. [Mortensen02] J Mortensen, V Vinayagamoorthy, M Slater, A Steed, B Lok, and M Whitton,

“Collaboration in Tele-Immersive Environments”, Eighth Eurographics Workshop on Virtual Environments (EGVE 2002) on May 30-31, 2002.

[Niem97] W Niem "Error Analysis for Silhouette-Based 3D Shape Estimation from Multiple Views", Proceedings on International Workshop on Synthetic - Natural Hybrid Coding and Three Dimensional Imaging (IWSNHC3DI'97), Rhodos, 6-9 September 1997.

[Pabst02] T Pabst and L Weinand, “PC Graphics Beyond XBOX – nVidia Inroduces GeForce4”, Retrieved March 28, 2002 http://www6.tomshardware.com/graphic/02q1/020206/index.html.

[Pertaub01] D-P Pertaub, M Slater, and C Barker, “An Experiment on Fear of Public Speaking in Virtual Reality,” Medicine Meets Virtual Reality 2001, pp 372-378, J D Westwood et al (Eds) IOS

[Potmesil87] M Potmesil, Generating Octree Models of 3-D Objects from Their Silhouettes in a Sequence of Images Computer Vision, Graphics and Image Processing Vol 40, 1-29, 1987.

[Raskar98] Raskar, R., Welch, G., Cutts, M., Lake, A., Stesin, L., and Fuchs, H “The Office of the Future:

A Unified Approach to Image-Based Modelling and Spatially Immersive Displays,” Computer Graphics M F Cohen Orlando, FL, USA (July 19 - 24), ACM Press, Addison-Wesley: 179-188.

[Razzaque01] Razzaque, S Z Kohn, M Whitton "Redirected Walking," Proceedings of Eurographics

In their 1994 paper, Rehg and Kanade present "Digiteyes," a vision-based hand tracking system designed to enhance human-computer interaction This research was featured in the proceedings of the Workshop on Motion of Non-Rigid and Articulated Objects, edited by J Aggarwal and T Huang, and published by the IEEE Computer Society Press The study spans pages 16 to 22, highlighting innovative approaches to tracking hand movements for improved user engagement with technology.

[Satalich95] G Satalich, “Navigation and Wayfinding in Virtual Reality: Finding Proper Tools and Cues to Enhance Navigation Awareness”, Masters Thesis, University of Washington.

[Seitz97] S Seitz and C Dyer, Photorealistic Scene Reconstruction by Voxel Coloring, Proceedings of the Computer Vision and Pattern Recognition Conference, 1997, 1067-1073.

[Simone99] L Simone Poser 4 (Review) Retrieved March 26, 2002 http://www.zdnet.com/products/stories/reviews/0,4161,2313739,00.html

M Slater and M Usoh's study, presented at the Third Annual Conference on Virtual Reality in 1993, explores how a virtual body impacts the sense of presence in immersive virtual environments Their research, published in the proceedings of the conference, highlights the significance of embodiment in enhancing user experience within virtual reality settings The findings contribute to understanding the psychological effects of virtual bodies on users' immersion and engagement in VR.

[Slater94] M Slater and M Usoh Body Centred Interaction in Immersive Virtual Environments, in N. Magnenat Thalmann and D Thalmann, editors, Artificial Life and Virtual Reality, pages 125-148, John Wiley and Sons, 1994.

In the realm of virtual environments, significant contributions have been made by various researchers Sutherland's seminal work, "The Ultimate Display," published in 1965, laid the foundation for immersive virtual experiences Thalmann's 1998 paper highlights the crucial role of virtual humans in enhancing interface technology within these environments Additionally, Turk and Levoy's research on zippered polygon meshes from range images, presented at ACM SIGGRAPH in 1994, advanced the techniques for creating detailed 3D models Together, these studies underscore the evolution of virtual technology and its applications.

[Usoh99] M Usoh, K Arthur, et al Walking > Virtual Walking> F lying, in Virtual Environments.

Proceedings of SIGGRAPH 99, pages 359-364, Computer Graphics Annual Conference Series,

[Usoh00] M Usoh, E Catena, S Arman, and M Slater, Using Presence Questionnaires in Reality,

Presence: Teleoperators and Virtual Environments, 9(5) 497-503.

[Ward01] M Ward (2001) EDS Launches New Tool To Help Unigraphics CAD/CAM Software Users With Earlier Detection Of Product Design Problems Retreived March 26, 2002 http://www.apastyle.org/elecgeneral.html.

[Wechsler39] Wechsler, D The Measurement of Adult Intelligence, 1st Ed., Baltimore, MD: Waverly Press, Inc.

The study by Welch et al (1996) investigates how pictorial realism, the delay of visual feedback, and observer interactivity influence the subjective sense of presence in virtual environments Published in "Presence: Teleoperators and Virtual Environments," the research highlights the critical factors that enhance user immersion and engagement in virtual settings The findings suggest that realistic visuals and responsive interactions significantly contribute to a stronger sense of presence among users.

In their 2001 paper presented at the Eighth ISPE International Conference on Concurrent Engineering, G Zachmann and A Rettig explore the concepts of natural and robust interaction within the context of virtual assembly simulation Their research emphasizes the importance of enhancing user experience and efficiency in virtual environments, highlighting innovative approaches to improve interaction methods The findings contribute significantly to the fields of engineering and simulation, providing valuable insights for future applications in concurrent engineering practices.

Health Assessment (A.2) Kennedy-Lane Simulator Sickness (A.3) Guilford-Zimmerman Spatial Ability (A.4) During Experience Participant Experiment Record (A.5)

Interview (A.7)Kennedy - Lane Simulator Sickness (A.3)Steed - Usoh - Slater Presence Questionnaire (A.8)

Task Performance and Presence in Virtual Environments

Introduction and purpose of the study:

We invite you to take part in a study examining the impact of visually faithful avatars and natural interaction techniques on task performance in virtual environments (VEs) This research aims to provide valuable insights for VE researchers and practitioners, enhancing the effectiveness of VEs in training individuals for real-world scenarios.

The principal investigator is Benjamin Lok (UNC Chapel Hill,

The Department of Computer Science at UNC Chapel Hill is located in 361 Sitterson Hall and can be reached at 962-1893 or via email at lok@cs.unc.edu The faculty advisor for the department is Dr Frederick P Brooks Jr., who can be contacted at 962-1931 or through his email at brooks@cs.unc.edu.

What will happen during the study:

Participants will attend a one-hour session in the laboratory, where they will engage in a simple task within a virtual environment (VE) During the session, they will wear a helmet with two small screens positioned approximately three inches from their eyes, along with headphones to receive instructions In the traditional VE condition, participants will don data gloves, while in the hybrid condition, they will wear generic white gloves The study will utilize computers to track hand, head, and body movements throughout the VE experience, and both video and audio recordings will be made Participants will also complete questionnaires to assess their perceptions and feelings during and after the VE experience.

Approximately 30 people will take part in this study

We are committed to safeguarding your privacy by ensuring that your name is not used in any data recordings or research reports; instead, a code number will be assigned Additionally, no identifiable images from videotapes will be included in any presentations without your consent These videotapes will be securely stored for a duration of approximately two years before being destroyed.

Some individuals may experience mild symptoms such as disorientation, nausea, or dizziness while using virtual environment systems, akin to motion sickness or sensations felt during wide-screen movies and theme park rides These effects are generally not expected to be severe or to persist after leaving the laboratory Participants are encouraged to discontinue the experiment at any point if they feel uncomfortable.

You have the right to decide whether or not to participate in this study, and to withdraw from the study at any time without penalty.

Participants in this study will receive a payment of $10 for their involvement, irrespective of whether they complete the task However, individuals who do not fulfill the specified criteria on the signup sheet or fail to meet the health, stereo vision, and comfort requirements assessed on-site during the experiment will not be compensated.

The Academic Affairs Institutional Review Board (AA-IRB) of the

The University of North Carolina at Chapel Hill has granted approval for this study If you have any questions or concerns regarding your rights as a participant, please reach out to Barbara Goldman, Chair of the AA-IRB, at CB#4100, 201 Bynum Hall, UNC-CH, Chapel Hill, NC 27599-4100, or contact her by phone at (919) 962-7761, or via email at aa-irb@unc.edu.

I understand that this is a research study to measure the effects of avatar fidelity and interaction modality on task performance and sense-of-presence in virtual environments.

I understand that if I agree to be in this study:

• I will visit the laboratory one time for sessions lasting approximately one hour.

Tiêu đề	Interacting With Dynamic Real Objects In A Virtual Environment
Tác giả	Benjamin Chak Lum Lok
Người hướng dẫn	Dr. Frederick P. Brooks, Jr.
Trường học	University of North Carolina at Chapel Hill
Chuyên ngành	Computer Science
Thể loại	dissertation
Năm xuất bản	2002
Thành phố	Chapel Hill

Định dạng
Số trang	193
Dung lượng	5,1 MB