Laurence Cliffe

Intermedia arts | creative technology | research through design

Home | Archive

Augmented Sound Reality (ASR) with the Web Audio API?

Augmented Sound Reality (ASR) with the Web Audio API?

I think the Web Audio API remains the most attractive option at the moment for realising a mobile Augmented Sound Reality (ASR) system, due to the fact that it is pretty well supported across most platforms now, and I like the web angle due to its accessibility and transparency for users. I’m also interested in how this approach could be potentially delivered through an institution’s existing web presence, there’s an interesting example of this available on the V&A’s website.

I’ve made a start on a second prototype which utilises the Web Audio API in conjunction with the Generic Sensor API, which is compatible across platforms to varying degrees [2], to deliver a browser and phone based solution.

This builds on some of the ideas of the last prototype, albeit using phone sensor input to control the audio nodes, rather than the BLE interface I used previously. This is hopefully going to look something like:

Hat + phone sensors + audio API + bone conducting headphones = the beginnings of a mobile augmented sound reality system

After taking a look at Hazzard et al’s (2015) design notes for ‘Sculpting a Mobile Musical Soundtrack’, and digging a little deeper into the capabilities of the Audio API, there are some ideas that are mentioned in relation to  ‘ideal mobile authoring functionality’ that I think could be realised, these include:

  1. Seamless looping
  2. Triggering of files/loops in musical time
  3. Determining Points of playback within an audio file
  4. Stereo spatialisation
  5. Tagging of audio content with bpm/time signature/bar length
  6. Distance from a defined point to trigger audio content
  7. Distance from a defined point to trigger audio parameter manipulation

Potential solutions to some of these using the Audio API are evidenced in some online tutorials and examples (Archibald, 2016 & Robertson, 2014) and include: seamless looping, defining different playback points in one loop or audio file, and defining time signatures and tempos for dynamic in-time sequencing and musical looping. There’s also spatialisation and some interesting sequencing approaches.

Additionally, Chris Greenhalgh’s  system design notes for the DAOPlayer [4] proved useful for beginning to think about interaction design and the possible ‘entities’ that may be required to make things happen.

Artefact interaction design idea (based on the idea of promoting cultural engagement, disseminating audio archival content and inciting an interaction. This also draws upon Hazzard et al’s (2015) designs for a ‘Musical trajectory through an exhibit’.

Based on the LHV example I’ve been using, this could translate into something like this:

Possible system entities could include:

  1. Event – The space/place of the cultural event to be augmented with sound
  2. Zone –  The area within this space (for example a specific room within a gallery building or geographic location at the event)
  3. Collection – A group of related/similar objects make up a collection (may help to promote interaction with similar objects based on user interest and to theme audio content and delivery)
  4. Artefact – the specific object to which the audio would be attached
  5. User – To store interests and to collaborate with other like-minded/interested users?

Although this is all very exciting, the problem of indoor positioning and object proximity persists. But I do have an idea for prototype v3: Camera (track the object, not the user)

For use with the Audio API, the camera could be accessed using getUserMedia() and used to track the actual object, shape or a colour code etc. (similar to artcodes or aestheticodes perhaps). Could also be useful for tracking other users for collaborative purposes.

I think it should be possible using either Tracking.js or JSFeat. – though some testing will be needed to see what level of accuracy and consistency it can actually achieve, also real-time update rates may be an issue.

The users proximity to the object could be then be determined by the pixel size of the object or pattern code in the video window. The coordinates of the object in the video window could determine the position of the objects associated audio in stereo space. If we know the user’s position in relation to a stationary object, could the user’s actual position be related back into the system?

Or, could even use a 360º camera and map the position of the artefact relative to the user in 360º sound space.


  1. Hazzard, A., Benford, S., & Burnett, G. (2015). Sculpting a Mobile Musical Soundtrack (pp. 387–396). Presented at the the 33rd Annual ACM Conference, New York, New York, USA: ACM Press.