Saturday, July 10, 2004

OT: Beyond the dimension

O'Reilly Network: Beyond the dimension

by Jono Bacon
Jul. 7, 2004

We are facing an interesting time in the Open Source desktop world. Not only are a number a of interesting technologies being developed for making our computers work more transparently, but a new technology has been Open Sourced recently that provides a new playground for a new way of thinking about the Open Source Desktop; this technology is Project Looking Glass.

Project Looking Glass (PLG) is a technology that was created by Sun to create a 3D desktop environment. The environment gives you the ability to perform simple operations such as flipping windows, changing the perspective and view of an object and other functions. The software was created by Sun to explore the possibilities for 3D based applications and a 3D based desktop, and although fairly useless in its current incarnation, the prototype provides a level of usable framework to create 3D applications and experiment with a new way of interacting with software.

The aim of this article is to discuss some ideas and concepts for making use of a 3D environment. Before I continue, there should be a few disclaimers however. First of all, I am no usability expert, and I am actually fairly cynical about certain aspects of usability theory. As such, you should take my ideas here as simply ideas - they were in no way researched and are not backed up with data to prove their usefulness. Secondly, the ideas here can apply to any 3D environment or software, and not specifically PLG. Feel free to make use of these ideas in your own 3D environment.

I believe that a 3D environment could be useful. There has been much discussion on the net about the worth of a 3D interface, particularly considering that it is confined within the remit of your 2D screen and typical 2D input devices; keyboard and mouse. Although I share some cynicism to a point, I also do believe that people can perceive 3D sufficiently on a screen to interact with it. You only have to look at how we perceive 3D in video games and movies to see this. I think the biggest challenges that we face are not with perception, but with regards to the input and architecture of the environment.


I think it is fair to say that it is unreasonable to expect users of a 3D interface to go out and buy a special input device for their computer. We are not aiming to build a Minority Report type system here; the aim is to create a level of useful 3D interaction that is as familiar and intuitive as possible. I do believe that the mouse is useful here.

3D interfaces are based around three axis points:

* X axis. This is a width line from left to right and vice versa
* Y axis. This is a height line from top to bottom and vice versa
* Z axis. This is a depth line from far to near and vice versa

When considering our input mechanism, we need to take into account these axis requirements. In addition to this we need to consider the selection requirements. I believe that selection will be as simple in the 3D space as it is in a 2D space; you need to be able to select something (such as loading an application when double clicking an icon) and you need to able to hold something (such as dragging an icon by single clicking, dragging and releasing). The only other possible requirement is a context menu, but then I am rather skeptical of these, and I think a better solution can be achieved in the 3D space with semi-transparent overlays.

With these considerations, one choice of input could be:

* Left Mouse Button. Click and hold to move the X axis as if a camera was panning. Double click to flip to the other side (as if you had a mirror image looking back at you)
* Right Mouse Button. Click and hold to move the Y axis as if a camera was panning. Double click to flip to the other side (as if you had a mirror image looking back at you)
* Middle Mouse Button. Selection button. Double click to select an object to work with, and single click t select and object and move it
* Left+Right Buttons together/Scroll Wheel. Click and hold to move the Z axis as if a camera was panning. Double click to flip to the other side (as if you had a mirror image looking back at you). The scroll wheel will move you forward when you scroll forward and back when you scroll back

Although I have suggested which button can do what, these combinations can obviously be changed. The main point I am making is that you need a selection button and a means to control each axis. Some people have suggested using a Shift/Ctrl/Alt key in combination with the mouse, but I think this feels a little clumsy.

3D representation of objects

The 3D interface will never amount to anything if we don't consider some specific use cases and how the interface can be best used. I think the key to defining 'best used' is to clearly separate out 2D and 3D functions. I see no point in making everything 3D; some things are inherently 2D (such as creating a word processed document) and the interface should allow you to edit your document in a 2D window as if you were using KDE/GNOME.

I think the true value of 3D comes in when we consider how we interact with objects. A while back a friend of mine told me about John Siracusa's analysis of the spatial finder, and I found his commentary on how we interact with objects interesting. A 3D interface really allows us to take this concept and raise it to the next bar - in the 3D space we can truly interact with the object and not simply interact with iconic representations of objects.

Let us take for example, a file. In most current GUI's, a file is represented by an icon. This icon can be interacted in the sense of moving it to different locations and clicking on it to load the file into a viewer. In the 3D space this file could be literally an accurate representation of the file itself. In this sense we could represent some of the following types of file:

* Paper Document. A document would look like a paper document, complete with the text of the document displayed on the front. We could use the 3D space the visualise the depth of the document with the number of pages; a document with 600 pages would look a lot fatter than a 3 page document. This way you can visualise what kind of document you are looking for and how long it will take to read. We could also use the paper size and shape to represent the kind of document - a business card will look different to an A3 poster for example
* Video. A video file could be represented by a 3D TV icon that is actually playing the video while you are navigating your files. I see no point in distinguishing between one video format and another. I did consider the value of having a Quicktime video playing on a Mac icon, but the user should not need to care about this. A video is a video and it is up to the player to care about file formats
* Devices. Devices could be really interesting. If you plug a digital camera into your USB port, you should see a digital camera appear on the screen. The digital camera icon should then be able to be selected (it will then zoom into the back of the camera) where you can flick through the pictures on the desktop's camera screen. This is tying together the concept of taking off pictures from a device - the user will naturally want to look at the 3D representation of the device to maker this happen. The current method of selecting a drive or the pictures coming off automatically is rather unintuitive; the user needs to interact with a virtual representation of the device

Some icons will obviously be 2D by their very nature. A .png or .jpg image is obviously a flat 2D image and is represented as such, but the key is in providing a realistic representation to use of what type of content the object is. As an example, the user needs to see an intrinsic link between the document they type into and the document that comes out of their printer.

Application use cases

Before we can consider any kind of development effort, we need to come up with some ideas for how 3D applications will work. We need to formulate these ideas into use cases that can be clearly discussed and debated over. Here are some ideas:

File management

If there is something that humans seem to have no problem understanding is that of drawers, cupboards, fridges and other square boxes with a door on the front. We also understand pigeon holes, boxes, containers and other methods of putting one object in another. We also innately understand that if you put two objects in a box you only need to move the box to move both objects. This can be useful for dealing with directories and moving files around.

I think what we need to create in this kind of interface is a number of of visual representations of real world storage containers. As an example, a hard disk could be represented as an office/storage room (we need to visually suggest that the hard disk is bigger than anything on it, so we need to visually represent the actual disk as a larger room). Within this room we then have a number of storage cabinets (directories) in which the files can be stored. Moving a cabinet from one room to another should be as simple as dragging it over from one room to the other. With the metaphor of cabinets we can also have different types of storage container for different types of information. A typical My Documents type directory could be a filing cabinet for example.

With this kind of metaphor I want to steer clear of someone walking into a 3D room and in a Doom III style manner and moving a hand around to pick up files. This whole metaphor is based on iconic meaning tied in with a real world relationship between the objects. Here is a use case:

* The user clicks on the file room icon and we see two rooms (for two hard disks) appear on the main body of the desktop
* The user clicks on one room and the two main rooms shrink to a larger size. Inside this room we have a 3D representation of a normal room with file cabinets. If the user clicks on a cabinet the full room goes very transparent and the cabinet increases in size and we can look inside at the contents

Creating content. E.g. burning a CD

The concept of burning a CD follows my ideas for creating any type of simple content. For this we need to identify the core components of the object we are creating, and put on the screen a simple template that allows the user to click on the relevant part of the object to change it. For a burnable CD we will typically have the CD itself and a cover. We may also have a cover for the back of a CD case. Here is the use case:

* The user wants to create a new CD, so he/she opens up the media store cabinet and drags a blank CD onto the desktop (the media cabinet will classify data and audio CD's as separate disks - the user just selects the relevant type of disk)
* When the CD appears on the desktop it is in an isometric view so the user can click on the extruded cover or the CD itself
* When the user clicks on the CD, he/she will be taken to the file store if a data CD is being created, or to the music library if an audio CD is being created. The user can then drag files onto the CD and an overlay box will say what files are on the CD and how much space is left (a visual representation of space should be used)
* With the CD layout ready, the user can then drag the CD into the CD press machine icon which will then burn it. When the CD is finished the computer will do some checks to see if the same files that were requested to be burned on the CD can be read; if they cannot be read the computer will mark the CD as damaged and ask if another CD from the media store can be used

What could be useful for this case is that when the user buys some new CDs he/she is encouraged to add them to the media store - this way the computer can let the user know when he/she is running out of media. This is particularly useful with the computer checking if the CD's are working or damaged when the burning process is finished.

Device handling

When a user plugs in a device, it should be visually represented on the screen. This will make an intrinsic link between the physical device and the virtual device, although they may look different physically (this is the biggest problem). With this device on screen, the user should be able to interact with it in a similar way to the real device. Let us assume we are plugging in a digital camera:

* The user plugs the camera into the USB port. In the top right corner of the screen a 3D representation of the camera shows up and starts spinning around
* When the user clicks on the camera, the camera appears in the body of the desktop much larger
* The view will now zoom into the camera screen and the user can view images in a series of thumbnails
* The user can then drag the pictures to their photo album (this photo album will be a visual representation of a book and the user can add pictures to the different pages of the book)

This system is not radically different to the current method of viewing pictures on a drive, but we are connecting together the concept of pictures on the device and actually dragging them to somewhere useful.

These use cases are not necessarily the right way to do things, but they provide a starting point for discussion. With more consideration and some prototypes we can better target the 3D aspects of the interface in the applications and make these use cases more representative of how we physically interact with the world.


I firmly believe that the 3D desktop environment has some great potential, but it needs to combine the best elements of the 2D methods we currently use and the innovative 3D ideas we will consider in the desktop of the future. This article has been written to hopefully pique the interest and ideas of people to think about how we can create an interface that is far easier to use and more representative of the real world.

The biggest challenge when implementing an interface such as this is how far you represent reality. As an example, when you plug your camera in and look at the pictures on the virtual screen, you should really be able to use the functions on the camera as if it was the physical device, but the software limits this potential to merely grabbing pictures and maybe taking a few shots. In this sense the physical representation cannot be fully imitated - we simply need to get a good batting average.

I would love to hear your thoughts on all of this, so feel free to get in touch with me or scribe your thoughts down in the comments box below. I am as interested to learn new ideas as much as coming up with new ideas; this could really mark a new wave in the Open Source desktop revolution.

Jono Bacon has been working as a full time writer and technology consultant/developer since 2000 and has worked for a variety of publishers and companies.

oreillynet.com Copyright © 2004 O'Reilly Media, Inc.


Post a Comment

<< Home

Get Firefox!