Wujia / Haibo Wu

Blog 文章

We Released WeShop Commercial Photography 1.5 — Some Recent Reflections 我们发布了WeShop商拍1.5版----分享一些最近的思考

Reflections on WeShop’s product evolution, AI commercial photography, and e-commerce creative workflows.

关于 WeShop 产品演进、AI 商拍场景和电商创意工作流的阶段性思考。

English translation provided here; the Zhihu link is the original source. 中文为知乎原文,英文为译文。

TL;DR: WeShop 1.5 makes meaningful progress in preserving product details and increasing freedom in background transformation, expanding the range of usable scenarios. For core issues such as faces, color, lighting, perspective, hands, and multi-person images, we still need to wait for version 2.0.

As usual, let’s look at some results first:

Turning original backgrounds into studio shots

我们发布了WeShop商拍1.5版----分享一些最近的思考
我们发布了WeShop商拍1.5版----分享一些最近的思考
我们发布了WeShop商拍1.5版----分享一些最近的思考

Switching between outdoor and indoor backgrounds

我们发布了WeShop商拍1.5版----分享一些最近的思考
我们发布了WeShop商拍1.5版----分享一些最近的思考
我们发布了WeShop商拍1.5版----分享一些最近的思考

Mannequins

我们发布了WeShop商拍1.5版----分享一些最近的思考
我们发布了WeShop商拍1.5版----分享一些最近的思考

Magazine covers

我们发布了WeShop商拍1.5版----分享一些最近的思考
我们发布了WeShop商拍1.5版----分享一些最近的思考
我们发布了WeShop商拍1.5版----分享一些最近的思考

Product still life, preserving lighting and edge details

我们发布了WeShop商拍1.5版----分享一些最近的思考
我们发布了WeShop商拍1.5版----分享一些最近的思考
我们发布了WeShop商拍1.5版----分享一些最近的思考
我们发布了WeShop商拍1.5版----分享一些最近的思考

For more examples, please visit: WeShop 1.5 release — WeShop

A necessary disclaimer: all views below are personal views only. They concern AI application startups, not companies competing in foundation-model races. These two kinds of companies follow very different startup models and should not be mixed together.

Two important characteristics of AI-native companies

For me, what counts as an AI-native company is a crucial question. If AI-native companies do not exist, then teams like ours building AI applications will have no long-term commercial value; we would merely be doing trial and error for large companies. With this wave of large-model development, I personally believe AI-native companies will exist, and native commercial giants will grow out of it.

Based on WeShop’s practice, I will irresponsibly summarize two core characteristics of AI-native companies:

  1. The R&D process follows the principle of Prompt > LoRA > Finetune

When we discuss AI-native companies, we are really discussing where the dividend of this AI wave lies. Startup teams have no abundant manpower, capital, or resources. Their innovation must be built on relatively low trial-and-error costs, and prompting is the lowest-threshold innovation method that large models provide. In reality, many teams severely underestimate the potential of prompts and have not tested the boundaries of what large models can do with prompting.

Next is LoRA. Without LoRA, today’s community would not be as vibrant. In most cases, LoRA costs slightly more than prompting, but inside a team it can form a LoRA factory and enable assembly-line production. By integrating different LoRAs, teams can often create surprisingly strong product results.

Finally there is finetuning. In 2023, many friends came to me to discuss AI startup ideas and usually mentioned finetuning, buying machines, preparing data, and training a vertical model. That made me nervous. The difficulty and cost of finetuning are far beyond many people’s expectations; starting with prompts and LoRA is a more reasonable choice. Of course, after PMF, successfully finetuning a good model may be necessary to form a commercial moat.

2. Strong at single-point capabilities but weak at integration

In other words, current products are friendly to small customers. AI technology is still early, and there will inevitably be many problems — often with things previous players considered simple. In WeShop, for example, customers may ask: if the background can already be changed realistically, why can’t the shoulder strap on this dress stay thin, or why can’t a red dress become green? In LLMs, similarly, a model may write long essays, but why can’t it handle customer service properly?

For small customers, AI products solve core problems and bring exponential efficiency gains. But for large companies, because of scale and quality requirements, any step that cannot enter the business workflow reduces efficiency and is not worthwhile from an objective commercial perspective. This contradiction will be solved as technology improves and new workflows emerge, but it needs time. That gives many startup teams a window to grow. We must survive and grow before that critical point arrives.

AI applications urgently need hybrid product managers

I personally think this role is extremely scarce right now, and it is one of the key constraints limiting the emergence of excellent AI applications.

Product managers need sharp demand insight. Even in the mobile internet era, product managers with this ability were rare. Imagine a team with such a core person: their challenge is how to turn captured demand into a concrete application. Because of the uncertainty of AI technology, this process becomes unusually complex. In the mobile internet era, after a few rounds of PRD review, teams could determine whether a feature was feasible; there was little ambiguity. Whether the business could succeed on schedule was another matter. But AI development is different: you may produce a demo in half a day, yet spend half a year struggling to launch a product, not to mention growth, operations, and business-model building after PMF. If product and engineering teams cannot communicate effectively and align on standards, iteration efficiency will drop sharply.

Ideally, product people should also understand the characteristics of AI technology. Although AI technology has a lower threshold than past academic theory in some ways, building a coherent cognitive system still requires significant training and accumulation. This AI wave has arrived fast, and talent reserves are seriously insufficient. But after a year, I believe many people have gradually adapted and are quietly practicing in different ways. These efforts have not yet fully erupted, but we have reason to look forward to 2024.

Therefore, I hope people interested in product work actively embrace AI technology, dare to try, and dare to innovate. Especially at this stage, they should not blindly imitate. Innovation has the highest return on investment. Even if it does not succeed in the short term, the firsthand understanding gained through practice will significantly improve an individual’s and a team’s grasp of the AI industry.

We want to open source

The construction of WeShop has benefited from many open-source projects and friends in the Stable Diffusion community. After careful consideration, we decided that we should also contribute to the open-source community. We plan to gradually open source WeShop’s frontend, backend, and some model-training tools.

We have released the WeShop personal edition, which can be seen as a variant of the SD web UI. At this stage, we have only released part of the frontend code; it is not yet complete, and please forgive the code quality. Compared with existing WebUIs, we added task management, asynchronous execution, and remote multi-user access, making it more suitable for using SD in real commercial environments.

Because the team is small, full open source is still on the way. We need time to complete engineering work on the codebase, so it has not yet been released on GitHub. We have formed a dedicated group and will disclose the source code step by step. People interested in this are welcome to learn more through the Feishu document.

Feishu document: WeShop open-source notes

Some useful resources

Here are some SD- or LLM-related resources that I think are good.

Professor Li Jian from Peking University has a talk on LCM; the whole series is also very good.

Bilibili: Professor Li Jian on LCM

The mathematics of deep learning series from IDEA, founded by Harry Shum:

Bilibili: Mathematics in Deep Learning

Mu Li’s course. Unfortunately, he stopped updating after starting his company.

Bilibili: DALL·E 2 paper reading

Andrew Ng’s courses:

DeepLearning.AI Short Courses

YouTube:

YouTube: https://www.youtube.com/watch?v=T0Qxzf0eaio

Song Yang:

YouTube: https://www.youtube.com/watch?v=y8q3gh61OY0

A broad overview:

YouTube: https://www.youtube.com/watch?v=cS6JQpEY9cs

The Lex Fridman interview with Ilya:

YouTube: Lex Fridman interview with Ilya

LLM product recommendations

There are many resources I did not include. Since I now use LLMs to help me understand text-heavy information more deeply, here are a few tools I often use:

For academic paper questions, ChatGPT tends to be too verbose, though this may also be because my prompting practice is not yet extreme enough.

For overseas use, I recommend Claude 2:

Claude

For domestic use, I recommend:

Kimi

For everyday questions, I recommend ChatGPT overseas; domestically I recommend Doubao and Tongyi. Doubao’s voice interaction is quite good.

Other posts about WeShop

Wu Haibo: Reporting to everyone that our e-commerce AI model product WeShop beta is open for testing

Wu Haibo: Trying to answer frequently asked AIGC product business questions using WeShop as an example

Wu Haibo: Thoughts from building WeShop — written after the official WeShop launch

PS: If any of the product images above infringe rights, please contact me and I will delete them.

太长不看版:WeShop 1.5版在保持商品细节和提高背景变换自由度方面有较大进步,使其扩大了应用场景。在人脸、色彩、光影、透视、手部及多人图像等核心问题上,需要期待2.0版。

老规矩,先看一波效果:

各种原图背景转棚拍

我们发布了WeShop商拍1.5版----分享一些最近的思考
我们发布了WeShop商拍1.5版----分享一些最近的思考
我们发布了WeShop商拍1.5版----分享一些最近的思考

室外室内背景互换

我们发布了WeShop商拍1.5版----分享一些最近的思考
我们发布了WeShop商拍1.5版----分享一些最近的思考
我们发布了WeShop商拍1.5版----分享一些最近的思考

人台

我们发布了WeShop商拍1.5版----分享一些最近的思考
我们发布了WeShop商拍1.5版----分享一些最近的思考

杂志封面

我们发布了WeShop商拍1.5版----分享一些最近的思考
我们发布了WeShop商拍1.5版----分享一些最近的思考
我们发布了WeShop商拍1.5版----分享一些最近的思考

商品静物图,光影和边缘细节的保持

我们发布了WeShop商拍1.5版----分享一些最近的思考
我们发布了WeShop商拍1.5版----分享一些最近的思考
我们发布了WeShop商拍1.5版----分享一些最近的思考
我们发布了WeShop商拍1.5版----分享一些最近的思考

更多案例请访问:WeShop唯象妙境1.5版本上线 - WeShop唯象妙境

必须声明:以下所有观点皆为个人立场,仅涉及AI应用型创业公司,不包括参与大模型竞赛的公司。这两类公司采取迥异的创业模式,不能混为一谈。

AI Native公司的两个重要特征

对我来说,什么是AI Native公司是至关重要的问题。如果AI Native不存在,像我们这种做AI应用的创业团队将无长期商业价值,我们不过是为大公司试错而已。这一轮大模型的发展,我个人坚信会有AI Native的存在,会有原生的商业巨头成长出来。

通过WeShop的实践,我不负责任的总结AI Native公司有两个核心特征:

  1. 研发流程符合Prompt > LoRA > Finetune原则

我们讨论AI Native,其实是在讨论这波AI的红利是什么。对于创业团队,没有人力、资金和资源。它的创新必须建立在试错成本不高的前提下,prompt正是大模型时代给大家提供最低门槛的创新手段。实际上很多团队严重低估了prompt的潜力,没有对大模型的极限进行过prompt的边界测试。

其次是LoRA,没有LoRA技术,就不会有今天社区的繁荣。LoRA的成本一般情况下略大于Prompt,但LoRA在团队内部可以形成LoRA工厂,实现流水线生产。通过集成各种LoRA经常能制造出非常惊艳的产品效果。

最后是微调(finetune),2023年很多朋友找我交流AI创业的想法,一般都会提到要微调,购买机器,准备数据,训练一个行业模型。这时我就紧张了,微调模型的难度和成本,其实远超过大家的预期,从prompt和lora开始是更合理的选择。当然,在PMF后上需要成功微调一个好的模型,形成商业护城河。

2. 单点能力突出但整合性差

也就是当前产品对小客户友好,AI技术还处于早期发展阶段,必然会有大量的问题,而且这些问题往往是在原先玩家觉得比较简单的问题。比如在WeShop,客户会觉得背景都能换的逼真了,咋就做不到这个裙子的肩带不要变粗,咋就不能把红衣服变绿衣服。又比如在LLM中,长篇大论的文章都能写了,怎么就不能好好做个客服呢。

对于小客户来讲,AI产品解决的是核心问题,是指数级的效率提升。但对于大公司来讲,由于规模和质量的要求,无法进入它商业流程的环节都是降低效率的,从客观商业考量就不合算。这个矛盾会随着技术的发展、新的工作流出现得到解决,但它需要时间,这就给了很多创业团队发展的时机,我们必须在那个临界点到来前成长到能活下去的状态。

AI应用急需复合型产品经理

个人认为这个角色目前是极度稀缺的状态,也是限制优秀AI应用广泛涌现的关键因素之一。

产品经理需具备敏锐的需求洞察力。即便在移动互联网时代,具有此能力的产品经理亦屈指可数。设想若团队拥有这样一位核心人物,他们面对的挑战在于如何将捕捉到的需求转化为具体应用。由于AI技术的不确定性,这一过程变得异常复杂。在移动互联网时代,几轮产品需求文档(PRD)评审即可明确产品功能是否可行,不存在模糊地带。当然,业务能否如期成功又是另一回事。但AI开发的特点是,虽然可能半天就能出示一个demo,却可能半年也难以推出成品,更不用说进入产品市场契合(PMF)之后的增长、运营和商业模型构建。产研团队内部如果无法有效沟通,对齐标准,将大幅降低迭代效率。

理想的情况是,产品同学也能掌握AI技术的特征。尽管AI技术相比过去的学术理论整体门槛有所降低,但构建一个自洽的认知体系仍需要大量训练和积累。这波AI浪潮来势汹汹,人才储备严重不足。然而一年过去,我相信许多人已经逐渐适应,正悄悄进行各种实践,尽管这些努力尚未爆发,我们有理由期待2024年的到来。

因此,希望对产品行业感兴趣的朋友们积极拥抱AI技术,敢于尝试,勇于创新。特别是,在这个阶段,不应该盲目模仿。创新是回报率最高的投资,即使短期内创新未必成功,通过实践获取的真知也能显著提升个人及团队对AI行业的理解。

我们要开源

WeShop产品的构建受益于广泛的开源项目及SD社区朋友的支持。在深思熟虑后,我们决定亦应贡献于开源社区。计划将WeShop的前端、后端及部分模型训练工具逐步开放源代码。

目前已发布WeShop个人版,它可视为SD的webui的一个变体。当前阶段,我们仅发布了部分前端代码,尚不完整,代码质量亦请各位海涵。与现有WebUI相比,我们新增了任务管理、异步执行和远程多用户访问等功能,更适合商业环境中实际应用SD。

鉴于团队规模有限,全面开源仍在路上,我们需要时间完成代码的工程化改造,因此尚未在github上发布。我们已成立专门小组,未来将分步公开源代码。欢迎对此感兴趣的同仁通过飞书文档了解更多详情。

飞书文档:WeShop 开源说明

一些有趣的资料分享

分享一些我觉得不错的SD或LLM相关的资料给大家。

北大李建教授关于LCM的talk,整个专辑也非常不错。

Bilibili:北大李建教授关于 LCM 的 talk

沈向洋老师创办的IDEA研究院的深度学习的数学专辑:

Bilibili:深度学习中的数学

李沐老师的课程,可惜李沐老师创业后不更新了。

Bilibili:DALL·E 2 论文精读

andrew NG的:

DeepLearning.AI Short Courses

youtube:

YouTube:https://www.youtube.com/watch?v=T0Qxzf0eaio

song yang:

YouTube:https://www.youtube.com/watch?v=y8q3gh61OY0

大串讲:

YouTube:https://www.youtube.com/watch?v=cS6JQpEY9cs

Lex Fridman和IIya的访谈:

YouTube:Lex Fridman 和 Ilya 访谈

LLM产品推荐

还有很多资料没有放,因为我现在使用LLM来辅助深入理解文本类的信息,推荐几个常用的:

在论文学术问题上,chatgpt废话太多了(也有可能我对LLM的prompt实践不够极致)

国外推荐Claude2:

Claude

国内推荐:

Kimi

日常问题,国外推荐chatgpt,国内推荐豆包、通义,豆包的语音交互做的不错。

其他关于WeShop

吴海波:和大家汇报下我们电商AI模特产品WeShop beta版本开放测试

吴海波:以WeShop为例尝试回答一些经常被问的AIGC产品业务问题

吴海波:谈谈做WeShop过程中对AIGC产品的一些思考----写在WeShop正式版上线

PS:上诉产品图片如有侵权,请联系我删除。