Fiber 設計原則

05-24

Fiber 設計原則

前言

最近各種 Fiber 的解析文章，除了 @司徒正美大神的內容外，基本都來自 Lin Clark 的視頻，和 React Fiber Architecture（這篇有譯文）。作為一個忠實的 Vue 黨，一年多沒碰 React，也激起了重學的興趣。

但遺憾的是，React 16 的源碼變化極大，鄙人水平又有限，只好翻翻 PR，看看 Issue，圍觀core team 的討論。然而就在此時，小哥我無意逛到一個很有意思，或許也很有價值的（點贊的很多）。

約么是簡述了 Fiber 的設計原則，但似乎中文世界並沒有翻譯，暫時搬運過來。

順便貼一下中文世界其它的優秀解讀類資料，排名不分先後：
從React到React Fiber (不熟悉 React 的推薦這篇)
完全理解React Fiber | 黯羽輕揚 (熟悉 React 的推薦這篇)
React 16 Fiber源碼速覽 (源碼黨推薦這篇)

有看到更詳細解說更好的長文歡迎留言
PS：昨天找了半天發現熟悉的 updateChildren、updateQueue 都沒了，後面才發現升 16 全被幹掉了…
PPS：我們能否理解，前端框架在 Virtual DOM 的 Diff 優化上已經到瓶頸了，字元串比對演算法再怎麼調優極限都是 O(N)，所以下一步是玩調度器了么。
PPPS：這個技術會有多快被其它框架更進呢？

好了，不開腦洞，原 Issue 如下（如果文不對題別怨我）

Fiber Principles: Contributing To Fiber · Issue #7942 · facebook/react

I just wanted to document a few unique design patterns that apply to Fiber, but not necessarily anything else. Ill start here.

You may mutate the fiber that youre working on during beginWork and completeWork phases but you may not have any other global side-effects. If you need a global side-effect, that have to be moved to the commitWork phase.
Fiber is a fixed data structure. It shares the same hidden class. Never add fields outside of construction in ReactFiber.
Nothing in the reconciler uses dynamic dispatch. I.e. we dont call a first class function, except for user code such as ref callbacks, functional components, render methods, etc. The rest is a static function available in a closure. I.e. use myHelper(obj) instead of obj.myHelper(). Any time we need to branch logic we use a switch statement over a tag which is a number that indicates which type of object were dealing with and which branch to take (see pattern matching).
Many modules are instantiated with a HostConfig object. It is a single constructor that gets called on initialization time. This should be inlinable by a compiler.
Nothing in Fiber uses the normal JS stack. Meaning it does use the stack but it can be compiled into a flat function if needed. Calling other functions is fine - the only limitation is that they cant be recursive.
If I cant use recursion, how do I traverse through the tree? Learn to use the singly linked list tree traversal algorithm. E.g. parent first, depth first:

let root = fiber;let node = fiber;while (true) { // Do something with node if (node.child) { node = node.child; continue; } if (node === root) { return; } while (!node.sibling) { if (!node.return || node.return === root) { return; } node = node.return; } node = node.sibling;}

Why does it need to be this complicated?

We can use the normal JS stack for this but any time we yield in a requestIdleCallback we would have to rebuild the stack when we continue. Since this only lasts for about 50ms when idle, we would spend some time unwinding and rebuilding the stack each time. It is not too bad. However, everything along the stack would have to be aware of how to "unwind" when we abort in the middle of the work flow.
It is plausible we could do this at the level of OCaml algebraic effects but we dont currently have all the features we need and we dont get the performance tradeoffs we want out of the box atm. This is a plausible future way forward though.
Most code lives outside of this recursion so it doesnt matter much for most cases.
Most of what React does is in the space of what the normal stack does. E.g. memoization, error handling, etc. Using the normal stack too, just makes it more difficult to get those to interact.
Everything we put on the stack we generally have to put on the heap too because we memoize it. Maintaining the stack and the heap with the same data is theoretically less efficient.
That said, all of these optimizations might be moot because JS stacks are much more efficient than JS heaps.
One thing that I wanted to try was to compile React components to do work directly on these data structures, just like normal programming languages compile to make mutations etc. to the stack. I think thats where the ideal implementation of React is.

Arent you just working around the lack of threads in the language?

Yes and no.

It is true that we dont have a good option in JavaScript to run threads - and that is a huge problem. We tried exploring various options of running in Web Workers, parallel JS, tried to propose shared immutable persistent data structures to the language, experimented with custom VM tweaks etc. JavaScript the language isnt very suitable for this because of mutable shared runtimes like prototypes. The ecosystem isnt ready for it because you have to duplicate code loading and module initialization across workers. Garbage collectors are not as efficient as they currently are if they have to be thread safe, and VM implementors seem to be unwilling to bare the implementation cost of persistent data structures. Shared mutable typed arrays seems to be moving along but requiring all data to go through this layer seems to be unfeasible in the ecosystem today. Artificial boundaries between different parts of the code base also doesnt work very well and introduces unnecessary friction. Even then you have lots of JS code like utility libraries that have to be duplicated across workers. Which leads to slower start up time and memory overhead. So, yes, threads are likely out of the question until we can target something like Web Assembly.

However, an interesting realization is that there are other benefits of the Fiber architecture that are applicable whether you have threads or not.

ComponentKit, that runs on native, does work using threads for example. It is able to start doing higher priority work in one thread while a lower priority thread is still happening. Leading to a much simpler implementation. However, there are some limitations.

You cant safely abort the background thread. Aborting and restarting a thread is not very cheap. In many languages it is also not safe because you could be in the middle of some lazy initialization work. Even though it is effectively interrupted, you have to continue spending CPU cycles on it. One solution is to do the samething that Fiber does - make an API that has yield points so that you can unwind the stack safely. Then check a flag periodically if you should abort. That is effectively what Fiber does too.

Another limitation is that since you cant immediately abort a thread, you cant be sure if two threads are processing the same component at the same time. This leads to some limitations such as not being able to support stateful class instances (like React.Component). Although that might be a good thing for other reasons.

Another thing that threads dont automatically buy you is the ability to resume part of the work. You cant just memoize part of the work that youve done in one thread and reuse it in another. You can certainly implement that with threads but youd end up with similar complexity as Fiber - on top of the threads.

Threads have a slight benefit that you can start the next work slightly earlier because you can interject instructions while the background thread is still powering down. However, because our yield points are frequent enough by default we dont think that will matter for Fiber.

Therefore, yes, we are writing a more complex implementation because were lacking threads in JavaScript, but because were forced to deal with that well end up with better features than if we simply relied on threads for scheduling.

What about parallelism?

It is true that threads can get you parallelism. However, this has nothing to do with scheduling for responsiveness. You generally dont want to spend one CPU processing work that was already aborted because higher priority work arrived. That doesnt buy you anything at all.

Instead, what you really want is to calculate independent work in different threads. For example, two React siblings can be calculated in parallel threads since theyre disconnected. If we had access to threads, thats what we would do. However, everything about Fiber is still valid for scheduling purposes in that architecture even if we use threads for parallelism of subtrees.

So, in conclusion, no were not just working around limitations in the language.

NOTE: This is not to say that you should choose cooperative scheduling over threads in your project/use case. Its just that for our particular use case it made sense. Threads still make sense in many other cases.

Ok, so cooperative scheduling might have some benefits over preemptive threads but...

Couldnt you just use generator functions like other scheduling frameworks have done?

No.

Theres two reasons for this.

Generators doesnt just let you yield in the middle of a stack. You have to wrap every single function in a generator. This not only adds a lot of syntactic overhead but also runtime overhead in any existing implementation. Its fair that the syntax might be more helpful than not, but the perf issue still stands.
The biggest reason, however, is that generators are stateful. You cant resume in the middle of it.

function* doWork(a, b, c) { var x = doExpensiveWorkA(a); yield; var y = x + doExpensiveWorkB(b); yield; var z = y + doExpensiveWorkC(c); return z;}

If I want to execute this across multiple time slices I can just step through this. However, if I get an update to B when I have already completed doExpensiveWorkA(a) and doExpensiveWorkB(b) but not doExpensiveWorkC(c) there is no way for me to reuse the value x. I.e. to skip ahead to doExpensiveWorkB with a different value for b but still reuse the result of doExpensiveWorkA(a).

This is important to React since we do a lot of memoization.

It is plausible that you can add that as a layer around, but then youre really not gaining much from the use of generators.

There are also languages that have generators that are designed for a more functional use case that has this capability. JS is not one of them.