Audible Panorama: automatic spatial audio generation for panorama imagery


Given a panorama image, our system will run the object detection on the slices sampled from the panorama image.
The objects in the slices will be marked by a bounding box with the confidence score.
***NOTE: We run our approach on 1305 images, but due to the copyright issues, we only provide the URL to download the images instead of putting the images on our website ***
Results:
The results file includes 2 folders:panorama:
      includes all the panorama images we captured by ourselves and the file
downloadLinks.txt
    contains the download links to the Flickr images.
data:
    includes all the raw data and the result data which is generated by our system for each scene.
In the data folder, each sub-folder is associated with a panorama image in the panorama folder.scene:
    includes the visualization of the object detection for the slices of the panorama image.
data.ini:
    the raw data of scene classification, object detection, and object recognition.
fullSound.ini:
    the result data generated by our system.
Format of downloadLinks.txt:
For each line, we have:
    Image name,
    License type,
    The Image exactly download URL,
    The original webpage which the image has been posted.

Example: ZZZ25086550514, license:1, farm2.staticflickr.com/1629/25086550514_240a1a97c4_o.jpg, flickr.com/photos/24128368@N00/25086550514/

License type:
0: All Rights Reserved

1: Attribution-NonCommercial-ShareAlike License

2: Attribution-NonCommercial License

3: Attribution-NonCommercial-NoDerivs License

4: Attribution License

5: Attribution-ShareAlike License

6: Attribution-NoDerivs License

7: No known copyright restrictions

8: United States Government Work

9: Public Domain Dedication (CC0)

10: Public Domain Mark

Format of data.ini:
The data.ini file contains the raw data of scene classification, object detection, and object recognition.

We use the section-key-value-comment pair to organize this file.
The format of the pair is: [section] key = value ; comment.
*** Note: anything after the semicolon «;» will be the comment and the comment would not provide any useful information.***
Base Section:
The Base section provides a overall information about the scene classification, the object detection and the object recognition.frameCount
    : indicates how many slices for the panorama image, it always be 10.
objectCount
    : indicates the how many objects in total. It includes the duplicated objects, we will remove them later.
objectCatalogCount
    : indicates how many object catalogs have been detected in this scene.
objectCatalogList
    : shows the list of object catalogs.
sortedDescription
    : shows the list of scene catalogs sorted by their scores.
sortedScore
    : shows the list of scores of the scene catalogs.
unDuplicatedObjectIds
    : the ids of the unduplicated objects.
unDuplicatedObjectIdsCount
    : the # of the unduplicated objects.
Frame Section:
The format of the Frame section is [frame_X] where X is the id of the frame.cameraEulerAngle
    : the euler angle of the camera while shooting this frame.
imageWidth & imageHeight
    : the size of the frame.
file
    : the path if of screenshot of the frame.
objList
    : the list of the object's ids detected from this frame. (Includes the duplicated objects.)
objCount
    : the # of objects. (Includes the duplicated objects.)
description
    : the list of the scene classification of the frame.
score
    : the list of scores associate with the description.
Object Section:
The format of the Object section is [Object_X] where X is the id of the object.frame
    : indicats which frame the object belongs to.
tag
    : the tag of this object.
minY,minX,maxY,maxX
    : the frame of the bounding box.
center,leftTop,leftBottom,rightTop,rightBottom
    : the angles in euler of this object. The angle of the center can be turned into the direction of the object related to the orign of the virtual world.
depth
    : the depth of the object base on the reference object.
action
    : the action tag detected by the object recognition. Only the unduplicated person object will have this key.
Format of fullsounds.ini:
The fullsounds.ini file is generated by our system base on our approach.
The system will use this file to place the sounds to the scene.

We use the section-key-value-comment pair to organize this file.
The format of the pair is: [section] key = value ; comment.
*** Note: anything after the semicolon «;» will be the comment and the comment would not provide any useful information.***
Base Section:
The Base section provides a overall information about the scene classification, the object detection and the object recognition.soundCount
    : the # of objects in this scene.
Sound Section:

The format of the Sound section is [Sound_X] where X is the id of the sound.

tag

    : the tag of this sound.
soundFile
    : the sound file the system used.
bObject
      : indicates if this sound is an object or background.
True
      means object and
False
    means background.
location
    : the location of this sound.
Sound Database:
The soundDatabase file includes 2 folders: background and soundableObject.
The sub-folders of these two folders are named based on the tags for scene classification and object recognition.
The mp3 files in those sub-folders are the sound sources we used in our system.
0 comments