




版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)
文檔簡介
基于物理條件約束的可信視覺生成大模型Visual
generative
modelInputOutputVAE:
maximize
variationallowerboundVideo
generative
methods?
Thefieldofvideo
generationhasseenrapiddevelopment,
reachingseveralmilestones...VAE:
maximize
variationallowerboundGAN:
AdversarialtrainingFlow-based
models:
Invertible
transform
ofDiffusionmodels:
GraduallyaddGaussian
noisedistributionsandthenreverseDiffusion
for
visual
generation
(1)?
DenoisingDiffusion
Probabilistic
Models
(DDPMs)Diffusion
for
visual
generation
(2)?
Stochastic
Differential
Equations
(Score
SDEs)Key
Elements
of
visual
Diffusion
Models?
Pixel
diffusion
(originalinput)?
Latent
spacediffusion?
Unet?
TransformerSora,
breakthrough?
Consistency:consistencyin3Drendering,long-rangecoherence,
andobjectpermanence.?
Highfidelity.?
Surprisinglength:extended
videolength
capability(Sora:
1
minutevs.previous
systems:
seconds).?
Flexible
resolution:generation
ofvideosacross
various
durations,aspectratios,
andresolutions.Sora,
key
technologies?
TheDiTframework
by
Meta
(2022.12)is
designedfor
videoprocessing.?
Google's
MAGViT
(2022.12)focuses
onVideoTokenization.?
GoogleDeepMindintroduced
NaViT(2023.07)to
supportvariousresolutions
andaspectratios.?
OpenAI's
DALL-E
3
(2023.09)enhancesVideoCaptiongeneration
forimproved
conditioned
videocreation.Modeling
the
physical
world?
We
knowthat
itis
verycomplicated
real
physical
model.probabilistic?
bayesian
inference;?
probabilisticgraphical
models.deterministic?
mathematicalequations;?
physics
basedsimulation;?
control
theory.Modeling
the
physical
world?
We
knowthatitisverycomplicatedrealphysicalmodel.probabilistic?
bayesian
inference;?
probabilisticgraphical
models.deterministic?
mathematicalequations;?
physics
basedsimulation;?
control
theory.Key
elements
of
a
physical
world?
GivenaSora
demo(thewalkingwomanintheTokyo
street),thekey
elementsofaphysicalworld,inthegraphicalway...?
Appearance?
Geometry?
Lighting?
Motion&Animation?
AudioModeling
the
physical
world?
[CVPR]Gaussian-Flow:4DReconstructionwithDynamic3DGaussianParticleEspressoChick-ChickenSplit-CookieFlame-SteakModeling
the
physical
world?
[CVPR]Gaussian-Flow:4DReconstructionwithDynamic3DGaussianParticleIt
is
hard
to
model
the
physical
world?
In
fact,
theworld
ishard
to
modelina
probablistic
way.?
Sora
resource
consumption...–
1billionsofimages;–
1millionsofhoursofvideo
data;–
10trillionstokens
aftertokenizingimagesandvideos–
Training
with~5,000A100sinparallel.It
is
hard
to
model
the
physical
world?
Sora
failure
casein
geometryandappearance.It
is
hard
to
model
the
physical
world?
Sora
failure
case
inlighting.It
is
hard
to
model
the
physical
world?
Sora
failure
case
inmotionandanimation.It
is
hard
to
model
the
physical
world?
VideoMV:ConsistentMulti-ViewGenerationBasedonLarge
VideoGenerativeModel?
Geometricenhancementisstillneededfor
multi-viewimages.It
is
hard
to
model
the
physical
world?
VideoMV:ConsistentMulti-ViewGenerationBasedonLarge
VideoGenerativeModel?
Fromastatic
aspects,SVDisabletomodelmulti-viewimages.It
is
hard
to
model
the
physical
world?
Stag4D:Spatial-Temporal
AnchoredGenerative4DGaussians?
From
atemporalaspects...It
is
hard
to
model
the
physical
world?
STAG4D:
Spatial-Temporal
AnchoredGenerative4DGaussians?
Fromatemporal
aspects...It
is
hard
to
model
the
physical
world?
Ilya
Sutskever:
compression
is
generalization.?
Thebest
losslesscompression
for
adataset
is
thebestgeneralization
for
data
outsidethedataset.Apply
the
deterministic
conditions?
Different
representationsof
deterministicconditionsinthephysicalworld.?
Muchlessdata
andparameters!GeometryLightingMotion&AnimationApply
the
deterministic
conditions?
Thereare
two
ways
to
injectdeterministicinformation.deterministic#1deterministic#2Image
Human
Animation?
Champ:
Controllable
andConsistent
HumanImage
Animation
with
3D
Parametric
GuidanceImage
Human
Animation?
Champ:
Controllable
andConsistent
HumanImage
Animation
with
3D
Parametric
GuidanceImage
Human
Animation?
Champ:
Controllable
andConsistent
HumanImage
Animation
with
3D
Parametric
GuidanceImage
Por
trait
Animation?
Hallo:
Hierarchical
Audio-Driven
VisualSynthesisfor
Portrait
Image
AnimationImage
Por
trait
Animation?
Hallo:
Hierarchical
Audio-Driven
VisualSynthesisfor
Portrait
Image
AnimationImage
Por
trait
Animation?
Hallo:
Hierarchical
Audio-Driven
VisualSynthesisfor
Portrait
Image
AnimationDynamic
Protein
Structure
Prediction?
4D
Diffusion
for
DynamicProtein
Structure
Prediction
with
Reference
GuidedTemporal
AlignmentDynamic
Protei
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
- 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 民辦教育機構(gòu)2025年合規(guī)運營與品牌建設(shè)教育資源共享效益評估報告
- 2025年環(huán)保產(chǎn)業(yè)園區(qū)產(chǎn)業(yè)集聚與區(qū)域綠色產(chǎn)業(yè)協(xié)同發(fā)展啟示研究報告
- 2025年工業(yè)互聯(lián)網(wǎng)平臺自然語言處理技術(shù)在智能文本生成式翻譯系統(tǒng)中的應(yīng)用報告
- 2025年干細胞療法在阿爾茨海默病治療中的應(yīng)用進展報告
- 2025年醫(yī)院電子病歷系統(tǒng)優(yōu)化構(gòu)建醫(yī)療大數(shù)據(jù)平臺報告
- 咨詢工程師基礎(chǔ)課件
- 2025年醫(yī)藥企業(yè)研發(fā)外包(CRO)模式下的臨床試驗數(shù)據(jù)管理系統(tǒng)的功能與性能報告
- 2025年儲能技術(shù)多元化在儲能系統(tǒng)成本控制中的應(yīng)用報告
- 2025年醫(yī)藥流通供應(yīng)鏈優(yōu)化與成本控制技術(shù)革新報告
- 成人教育終身學習體系構(gòu)建與平臺運營中的在線教育平臺用戶活躍度研究報告
- 制造執(zhí)行系統(tǒng)SMT MES解決方案
- 高二區(qū)域地理 撒哈拉以南的非洲課件
- 數(shù)字化精密加工車間項目可行性研究報告建議書
- 2022年《內(nèi)蒙古自治區(qū)建設(shè)工程費用定額》取費說明
- Q∕GDW 10799.6-2018 國家電網(wǎng)有限公司電力安全工作規(guī)程 第6部分:光伏電站部分
- 寧波市建設(shè)工程資料統(tǒng)一用表(2022版)1 通用分冊
- 危險化學品安全技術(shù)說明書MSDS—汽油
- 三甲醫(yī)院必備醫(yī)療設(shè)備清單大全
- 播音主持重音的教學課件
- 暴雨產(chǎn)流計算(推理公式_四川省)
- NUDD新獨難異失效模式預(yù)防檢查表
評論
0/150
提交評論