We Released WeShop Commercial Photography 1.5 — Some Recent Reflections 我们发布了WeShop商拍1.5版----分享一些最近的思考

2024-02-05 · Chinese original on Zhihu 知乎原文

Reflections on WeShop’s product evolution, AI commercial photography, and e-commerce creative workflows.

关于 WeShop 产品演进、AI 商拍场景和电商创意工作流的阶段性思考。

English translation provided here; the Zhihu link is the original source. 中文为知乎原文，英文为译文。

TL;DR: WeShop 1.5 makes meaningful progress in preserving product details and increasing freedom in background transformation, expanding the range of usable scenarios. For core issues such as faces, color, lighting, perspective, hands, and multi-person images, we still need to wait for version 2.0.

As usual, let’s look at some results first:

Turning original backgrounds into studio shots

Switching between outdoor and indoor backgrounds

Mannequins

Magazine covers

Product still life, preserving lighting and edge details

For more examples, please visit: WeShop 1.5 release — WeShop

A necessary disclaimer: all views below are personal views only. They concern AI application startups, not companies competing in foundation-model races. These two kinds of companies follow very different startup models and should not be mixed together.

Two important characteristics of AI-native companies

For me, what counts as an AI-native company is a crucial question. If AI-native companies do not exist, then teams like ours building AI applications will have no long-term commercial value; we would merely be doing trial and error for large companies. With this wave of large-model development, I personally believe AI-native companies will exist, and native commercial giants will grow out of it.

Based on WeShop’s practice, I will irresponsibly summarize two core characteristics of AI-native companies:

The R&D process follows the principle of Prompt > LoRA > Finetune

When we discuss AI-native companies, we are really discussing where the dividend of this AI wave lies. Startup teams have no abundant manpower, capital, or resources. Their innovation must be built on relatively low trial-and-error costs, and prompting is the lowest-threshold innovation method that large models provide. In reality, many teams severely underestimate the potential of prompts and have not tested the boundaries of what large models can do with prompting.

Next is LoRA. Without LoRA, today’s community would not be as vibrant. In most cases, LoRA costs slightly more than prompting, but inside a team it can form a LoRA factory and enable assembly-line production. By integrating different LoRAs, teams can often create surprisingly strong product results.

Finally there is finetuning. In 2023, many friends came to me to discuss AI startup ideas and usually mentioned finetuning, buying machines, preparing data, and training a vertical model. That made me nervous. The difficulty and cost of finetuning are far beyond many people’s expectations; starting with prompts and LoRA is a more reasonable choice. Of course, after PMF, successfully finetuning a good model may be necessary to form a commercial moat.

2. Strong at single-point capabilities but weak at integration

In other words, current products are friendly to small customers. AI technology is still early, and there will inevitably be many problems — often with things previous players considered simple. In WeShop, for example, customers may ask: if the background can already be changed realistically, why can’t the shoulder strap on this dress stay thin, or why can’t a red dress become green? In LLMs, similarly, a model may write long essays, but why can’t it handle customer service properly?

For small customers, AI products solve core problems and bring exponential efficiency gains. But for large companies, because of scale and quality requirements, any step that cannot enter the business workflow reduces efficiency and is not worthwhile from an objective commercial perspective. This contradiction will be solved as technology improves and new workflows emerge, but it needs time. That gives many startup teams a window to grow. We must survive and grow before that critical point arrives.

AI applications urgently need hybrid product managers

I personally think this role is extremely scarce right now, and it is one of the key constraints limiting the emergence of excellent AI applications.

Product managers need sharp demand insight. Even in the mobile internet era, product managers with this ability were rare. Imagine a team with such a core person: their challenge is how to turn captured demand into a concrete application. Because of the uncertainty of AI technology, this process becomes unusually complex. In the mobile internet era, after a few rounds of PRD review, teams could determine whether a feature was feasible; there was little ambiguity. Whether the business could succeed on schedule was another matter. But AI development is different: you may produce a demo in half a day, yet spend half a year struggling to launch a product, not to mention growth, operations, and business-model building after PMF. If product and engineering teams cannot communicate effectively and align on standards, iteration efficiency will drop sharply.

Ideally, product people should also understand the characteristics of AI technology. Although AI technology has a lower threshold than past academic theory in some ways, building a coherent cognitive system still requires significant training and accumulation. This AI wave has arrived fast, and talent reserves are seriously insufficient. But after a year, I believe many people have gradually adapted and are quietly practicing in different ways. These efforts have not yet fully erupted, but we have reason to look forward to 2024.

Therefore, I hope people interested in product work actively embrace AI technology, dare to try, and dare to innovate. Especially at this stage, they should not blindly imitate. Innovation has the highest return on investment. Even if it does not succeed in the short term, the firsthand understanding gained through practice will significantly improve an individual’s and a team’s grasp of the AI industry.

We want to open source

The construction of WeShop has benefited from many open-source projects and friends in the Stable Diffusion community. After careful consideration, we decided that we should also contribute to the open-source community. We plan to gradually open source WeShop’s frontend, backend, and some model-training tools.

We have released the WeShop personal edition, which can be seen as a variant of the SD web UI. At this stage, we have only released part of the frontend code; it is not yet complete, and please forgive the code quality. Compared with existing WebUIs, we added task management, asynchronous execution, and remote multi-user access, making it more suitable for using SD in real commercial environments.

Because the team is small, full open source is still on the way. We need time to complete engineering work on the codebase, so it has not yet been released on GitHub. We have formed a dedicated group and will disclose the source code step by step. People interested in this are welcome to learn more through the Feishu document.

Feishu document: WeShop open-source notes

Some useful resources

Here are some SD- or LLM-related resources that I think are good.

Professor Li Jian from Peking University has a talk on LCM; the whole series is also very good.

Bilibili: Professor Li Jian on LCM

The mathematics of deep learning series from IDEA, founded by Harry Shum:

Bilibili: Mathematics in Deep Learning

Mu Li’s course. Unfortunately, he stopped updating after starting his company.

Bilibili: DALL·E 2 paper reading

Andrew Ng’s courses:

DeepLearning.AI Short Courses

YouTube:

YouTube: https://www.youtube.com/watch?v=T0Qxzf0eaio

Song Yang:

YouTube: https://www.youtube.com/watch?v=y8q3gh61OY0

A broad overview:

YouTube: https://www.youtube.com/watch?v=cS6JQpEY9cs

The Lex Fridman interview with Ilya:

YouTube: Lex Fridman interview with Ilya

LLM product recommendations

There are many resources I did not include. Since I now use LLMs to help me understand text-heavy information more deeply, here are a few tools I often use:

For academic paper questions, ChatGPT tends to be too verbose, though this may also be because my prompting practice is not yet extreme enough.

For overseas use, I recommend Claude 2:

Claude

For domestic use, I recommend:

Kimi

For everyday questions, I recommend ChatGPT overseas; domestically I recommend Doubao and Tongyi. Doubao’s voice interaction is quite good.

必须声明:以下所有观点皆为个人立场，仅涉及AI应用型创业公司，不包括参与大模型竞赛的公司。这两类公司采取迥异的创业模式，不能混为一谈。

AI Native公司的两个重要特征

对我来说，什么是AI Native公司是至关重要的问题。如果AI Native不存在，像我们这种做AI应用的创业团队将无长期商业价值,我们不过是为大公司试错而已。这一轮大模型的发展，我个人坚信会有AI Native的存在，会有原生的商业巨头成长出来。

通过WeShop的实践，我不负责任的总结AI Native公司有两个核心特征:

研发流程符合Prompt > LoRA > Finetune原则

我们讨论AI Native，其实是在讨论这波AI的红利是什么。对于创业团队，没有人力、资金和资源。它的创新必须建立在试错成本不高的前提下，prompt正是大模型时代给大家提供最低门槛的创新手段。实际上很多团队严重低估了prompt的潜力,没有对大模型的极限进行过prompt的边界测试。

其次是LoRA，没有LoRA技术，就不会有今天社区的繁荣。LoRA的成本一般情况下略大于Prompt，但LoRA在团队内部可以形成LoRA工厂，实现流水线生产。通过集成各种LoRA经常能制造出非常惊艳的产品效果。

最后是微调(finetune)，2023年很多朋友找我交流AI创业的想法，一般都会提到要微调，购买机器，准备数据，训练一个行业模型。这时我就紧张了，微调模型的难度和成本，其实远超过大家的预期，从prompt和lora开始是更合理的选择。当然，在PMF后上需要成功微调一个好的模型，形成商业护城河。

2. 单点能力突出但整合性差

也就是当前产品对小客户友好，AI技术还处于早期发展阶段，必然会有大量的问题，而且这些问题往往是在原先玩家觉得比较简单的问题。比如在WeShop，客户会觉得背景都能换的逼真了，咋就做不到这个裙子的肩带不要变粗，咋就不能把红衣服变绿衣服。又比如在LLM中，长篇大论的文章都能写了，怎么就不能好好做个客服呢。

对于小客户来讲，AI产品解决的是核心问题，是指数级的效率提升。但对于大公司来讲，由于规模和质量的要求，无法进入它商业流程的环节都是降低效率的，从客观商业考量就不合算。这个矛盾会随着技术的发展、新的工作流出现得到解决，但它需要时间，这就给了很多创业团队发展的时机，我们必须在那个临界点到来前成长到能活下去的状态。

AI应用急需复合型产品经理

个人认为这个角色目前是极度稀缺的状态，也是限制优秀AI应用广泛涌现的关键因素之一。

产品经理需具备敏锐的需求洞察力。即便在移动互联网时代，具有此能力的产品经理亦屈指可数。设想若团队拥有这样一位核心人物，他们面对的挑战在于如何将捕捉到的需求转化为具体应用。由于AI技术的不确定性，这一过程变得异常复杂。在移动互联网时代，几轮产品需求文档（PRD）评审即可明确产品功能是否可行，不存在模糊地带。当然，业务能否如期成功又是另一回事。但AI开发的特点是，虽然可能半天就能出示一个demo，却可能半年也难以推出成品，更不用说进入产品市场契合（PMF）之后的增长、运营和商业模型构建。产研团队内部如果无法有效沟通，对齐标准，将大幅降低迭代效率。

理想的情况是，产品同学也能掌握AI技术的特征。尽管AI技术相比过去的学术理论整体门槛有所降低，但构建一个自洽的认知体系仍需要大量训练和积累。这波AI浪潮来势汹汹，人才储备严重不足。然而一年过去，我相信许多人已经逐渐适应，正悄悄进行各种实践，尽管这些努力尚未爆发，我们有理由期待2024年的到来。

因此，希望对产品行业感兴趣的朋友们积极拥抱AI技术，敢于尝试，勇于创新。特别是，在这个阶段，不应该盲目模仿。创新是回报率最高的投资，即使短期内创新未必成功，通过实践获取的真知也能显著提升个人及团队对AI行业的理解。