Audible Panorama: automatic spatial audio generation for panorama imagery

Given a panorama image, our system will run the object detection on the slices sampled from the panorama image.
The objects in the slices will be marked by a bounding box with the confidence score.

NOTE: We run our approach on 1305 images, but due to the copyright issues, we only provide the URL to download the images instead of putting the images on our website

Results:

The results file includes 2 folders:panorama:

includes all the panorama images we captured by ourselves and the file

downloadLinks.txt

contains the download links to the Flickr images. data:

includes all the raw data and the result data which is generated by our system for each scene. In the data folder, each sub-folder is associated with a panorama image in the panorama folder.scene:

includes the visualization of the object detection for the slices of the panorama image. data.ini:

the raw data of scene classification, object detection, and object recognition. fullSound.ini:

the result data generated by our system. Format of downloadLinks.txt:
For each line, we have:

Image name,

License type,

The Image exactly download URL,

The original webpage which the image has been posted.

Example: ZZZ25086550514, license:1, farm2.staticflickr.com/1629/25086550514_240a1a97c4_o.jpg, flickr.com/photos/24128368@N00/25086550514/

License type:
0: All Rights Reserved

1: Attribution-NonCommercial-ShareAlike License

2: Attribution-NonCommercial License

3: Attribution-NonCommercial-NoDerivs License

4: Attribution License

5: Attribution-ShareAlike License

6: Attribution-NoDerivs License

7: No known copyright restrictions

8: United States Government Work

9: Public Domain Dedication (CC0)

10: Public Domain Mark

Format of data.ini:
The data.ini file contains the raw data of scene classification, object detection, and object recognition.

We use the section-key-value-comment pair to organize this file.
The format of the pair is: [section] key = value ; comment.
*** Note: anything after the semicolon «;» will be the comment and the comment would not provide any useful information.***

Base Section:

The Base section provides a overall information about the scene classification, the object detection and the object recognition.frameCount

: indicates how many slices for the panorama image, it always be 10. objectCount

: indicates the how many objects in total. It includes the duplicated objects, we will remove them later. objectCatalogCount

: indicates how many object catalogs have been detected in this scene. objectCatalogList

: shows the list of object catalogs. sortedDescription

: shows the list of scene catalogs sorted by their scores. sortedScore

: shows the list of scores of the scene catalogs. unDuplicatedObjectIds

: the ids of the unduplicated objects. unDuplicatedObjectIdsCount

: the # of the unduplicated objects.

Frame Section:

The format of the Frame section is [frame_X] where X is the id of the frame.cameraEulerAngle

: the euler angle of the camera while shooting this frame. imageWidth & imageHeight

: the size of the frame. file

: the path if of screenshot of the frame. objList

: the list of the object's ids detected from this frame. (Includes the duplicated objects.) objCount

: the # of objects. (Includes the duplicated objects.) description

: the list of the scene classification of the frame. score

: the list of scores associate with the description.

Object Section:

The format of the Object section is [Object_X] where X is the id of the object.frame

: indicats which frame the object belongs to. tag

: the tag of this object. minY,minX,maxY,maxX

: the frame of the bounding box. center,leftTop,leftBottom,rightTop,rightBottom

: the angles in euler of this object. The angle of the center can be turned into the direction of the object related to the orign of the virtual world. depth

: the depth of the object base on the reference object. action

: the action tag detected by the object recognition. Only the unduplicated person object will have this key. Format of fullsounds.ini:
The fullsounds.ini file is generated by our system base on our approach.
The system will use this file to place the sounds to the scene.

We use the section-key-value-comment pair to organize this file.
The format of the pair is: [section] key = value ; comment.
*** Note: anything after the semicolon «;» will be the comment and the comment would not provide any useful information.***

Base Section:

The Base section provides a overall information about the scene classification, the object detection and the object recognition.soundCount

: the # of objects in this scene.

Sound Section:

The format of the Sound section is [Sound_X] where X is the id of the sound.

tag

: the tag of this sound. soundFile

: the sound file the system used. bObject

: indicates if this sound is an object or background.

True

means object and

False

means background. location

: the location of this sound.

Sound Database:

The soundDatabase file includes 2 folders: background and soundableObject.
The sub-folders of these two folders are named based on the tags for scene classification and object recognition.
The mp3 files in those sub-folders are the sound sources we used in our system.

...
0
0

Audible Panorama: automatic spatial audio generation for panorama imagery

***NOTE: We run our approach on 1305 images, but due to the copyright issues, we only provide the URL to download the images instead of putting the images on our website ***

Results:

Base Section:

Frame Section:

Object Section:

Base Section:

Sound Section:

Sound Database:

NOTE: We run our approach on 1305 images, but due to the copyright issues, we only provide the URL to download the images instead of putting the images on our website