METHOD AND APPARATUS FOR VIDEO OBJECT SEGMENTATION

Bibliographic Details
Title: METHOD AND APPARATUS FOR VIDEO OBJECT SEGMENTATION
Document Number: 20120294530
Publication Date: November 22, 2012
Appl. No: 13/522589
Application Filed: January 20, 2011
Abstract: Methods and apparatus for video object segmentation are provided, suitable for use in a super-resolution system. The method comprises alignment of frames of a video sequence, pixel alignment to generate initial foreground masks using a similarity metric, consensus filtering to generate an intermediate foreground mask, and refinement of the mask using spatio-temporal information from the video sequence. In various embodiments, the similarity metric is computed using a sum of squared differences approach, a correlation, or a modified normalized correlation metric. Soft thresholding of the similarity metric is also used in one embodiment of the present principles. Weighting factors are also applied to certain critical frames in the consensus filtering stage in one embodiment using the present principles.
Inventors: Bhaskaranand, Malavika (Goleta, CA, US); Bhagavathy, Sitaram (Plainsboro, NJ, US)
Claim: 1. A method for video object segmentation, comprising: aligning one or more reference frames with a current frame containing a video object; generating a foreground mask for a current frame based on a neighborhood similarity metric; and refining the foreground mask by using information from at least one video frame or mask.
Claim: 2. The method of claim 1, further comprising: generating initial foreground masks for a current frame with respect to each aligned reference frame based on a neighborhood similarity metric; combining information from the initial foreground masks to generate a single intermediate foreground mask for the current frame before refining the intermediate foreground mask.
Claim: 3. The method of claim 1, wherein the information from at least one video frame or mask used in the refining step is some combination of spatial and temporal information.
Claim: 4. The method of claim 2 wherein the combining step is performed using a consensus filtering mechanism.
Claim: 5. The method of claim 1, wherein said aligning step uses multi-hop homography between frames.
Claim: 6. The method of claim 2, wherein the initial foreground masks are generated on a block basis.
Claim: 7. The method of claim 2, wherein said initial foreground masks are generated using a normalized correlation metric.
Claim: 8. The method of claim 2, wherein the initial foreground masks are generated using weighting factors that weigh individual frames.
Claim: 9. The method of claim 2, wherein a three-level intermediate mask is used when generating foreground masks.
Claim: 10. The method of claim 2, wherein morphological operations are used to combine information from the initial foreground masks to generate a single mask for the current frame.
Claim: 11. An apparatus for video object segmentation, comprising: a memory and frame alignment mechanism that stores a plurality of frames of video and aligns one or more reference frames with a current frame containing a video object; circuitry that generates an intermediate mask for the current frame based on a neighborhood similarity metric; and a processor that refines the intermediate mask by using information from at least one video frame or mask.
Claim: 12. The apparatus of claim 11, further comprising: circuitry that generates initial foreground masks for a current frame with respect to each aligned reference frame based on a neighborhood similarity metric; a generator that combines information from the initial foreground masks to generate an intermediate mask for the current frame before refining the intermediate foreground mask.
Claim: 13. The apparatus of claim 11, wherein the processor uses information from at least one video frame or mask that is some combination of spatial and temporal information.
Claim: 14. The apparatus of claim 12, wherein the generator combines information using a consensus filtering mechanism.
Claim: 15. The apparatus of claim 11, wherein said memory and frame alignment mechanism uses multi-hop homography between frames.
Claim: 16. The apparatus of claim 11, wherein the circuitry that generates initial foreground masks generates masks on a block basis.
Claim: 17. The apparatus of claim 11, wherein the circuitry that generates initial foreground masks generates them using a normalized correlation metric.
Claim: 18. The apparatus of claim 11, wherein the circuitry that generates initial foreground masks generates them using weighting factors that weight individual frames.
Claim: 19. The apparatus of claim 11, wherein the circuitry that generates initial foreground masks generates them using a three-level intermediate mask.
Claim: 20. The apparatus of claim 11, wherein said processor uses morphological operations to combine information from the foreground masks to generate a single mask for the current frame.
Current U.S. Class: 382/173
Current International Class: 06
Accession Number: edspap.20120294530
Database: USPTO Patent Applications
More Details
Language:English