Real Time Communications in Web browser

W3C position paper, September 2010

Author: Philippe Le H├ęgaret, W3C

Introduction

While HTML has been widespread for more than 15 years now, its scope and its usage keeps expanding. It has been used and profiled in many environments, such as cellphones, televisions, tablets, or e-books. The attention around HTML5 keeps rising in the several, and not necessarily related, markets. Nowadays, HTML is becoming an application development platform, including for cloud computing, but it does beyond the Web or rich internet applications.

From the general audience point of view, HTML5 refers to a set of technologies that together form the future Open Web Platform. These technologies include the HTML5 specification itself but also CSS3, SVG, Geolocation, 2D API, Web Sockets, and others. The boundary of this set of technologies is informal and changes over time. Several standard organizations are building blocks for this platform:

Device APIs and Policy Working Group

W3C announced the creation of a Device APIs and Policy (DAP) Working Group to create client-side APIs that enable the development of Web applications and Web Widgets that interact with devices such as Camera or Microphone. Additionally, the group will produce a framework for the expression of security policies that govern access to security-critical APIs (see also the report from the December 2008 W3C workshop on Security for Access to Device APIs from the Web.

Privacy requirements

The DAP Working Group published several documents on requirements about Devices APIs, including on Device API Privacy Requirements.

Among the privacy requirements, one should note:

Codecs and media files

Among the deliverables of the DAP Working Group, the HTML Media Capture specification is the most relevant here. It defines HTML form enhancements and an API that provide access to the audio, image and video capture capabilities of the device. The API gives access to the format and codecs used by the medium. As in the HTML5 specification, the HTML Media Capture specification is limited since it does not recommend baseline audio and video codecs. This is a known issue, due to licensing rights and use restriction associated with codecs, and the lack of consensus among the browsers vendors. Work is also happening within the W3C Web Application Working Group around a File API (see File API and File API: Writer). The MediaFile API builds on top of the File API. Those file APIs should also be relevant to further communication ehnancements, such as P2P or streaming.

Streaming

One of the current interest is to keep extending the scope of HTML application by handling real time communication, especially with regards to video/audio interaction. W3C and IETF are currenlty cooperating to develop Web Sockets: W3C is working on the Web Sockets API while IETF is working on the protocol layer. In any case, the HTML5 media resource may be adapted to contain the description necessary for handling adaptive and live streaming and no change is necessary at the HTML markup level. Ian Hickson is proposing to add a device element to HTML with an associated API. The HTML5 media API itself could be extended to accept a media stream. The Audio Incubator Group is already working at exposing underlying audio data to the Javascript layer. The underlying protocol may take advantage of the WebSocket protocol, a P2P protocol, or others if necessary.

Performance

The W3C started the Web Performance Working Group in August 2008 o provide methods to measure aspects of application performance of user agent features and APIs. The Resource Timing draft defines an interface for web applications to access timing information related to HTML elements. W3C is also investigating the creation of a Web Performance Interest Group dedicated to creating a faster user experience on the Web.

Security

Independently of the protocol used to communicate, the information needs to be subject to the origin policies applied by the user agent. For the moment, Web applications are restricted to same-origin policy, preventing scripts to exchange or access data. The Web Application Working Group is working on the Cross-Origin Resource Sharing as well as Uniform Messaging Policy, Level One, to enable cross-site communication.

Conclusion

W3C is interested in continuing to expand the reach of the Web platform. We do believe that there are at least needs to handle adaptive and live streaming and we're interested in providing proper API access to Web applications, and thus under the W3C Royalty-Free licensing requirements. We will continue to evaluate the situation around video and audio codecs as well, to try to resolve the existing limitations in the HTML5 specification.