Non-Blocking Algorithm - Wait-Free Fix-size Slots

hellwolf 2008-11-30 11:16:19
原文: http://blog.chinaunix.net/u/8057/showart_1673353.html


Non-Blocking Algorithm - Wait-Free Fix-size Slots Management

Author: ZC Miao <hellwolf.misty ãt gmail dôṫ com>

Date: Sunday, November 30 2008
Problem

Provide a non-blocking algorithm that maintains an array of fixed-size slots with the following interface:

* void InitSlots(void)

Slots initialization.
* SLOTID GetSlot(void)

Exclusively get an id of one free slot, and prevent other user to get this slot.
* void PutSlot(SLOTID id)

Reclaim a slot returned by GetSlot, and make it free to get again.
* DATA* DecodeSlotID(SLOTID id)

Decode a id of slot, and return the memory address of that slot.

Primitive CAS

Atomic operation CAS(compare-and-swap) atomically compares the contents of a memory location to a given value and, if they are the same, modifies the contents of that memory location to a given new value1. The following algorithm assume that CAS is available on the target hardware.
First Algorithm(with mistakes)


typedef struct
{
int next;
DATA data;
}Node;

Node Slots[MAX_SLOT_NUM];

int head;

void InitSlots(void)
{
int i;

for (i = 0; i < MAX_SLOT_NUM - 1; ++i)
{
Slots[i].next = i+1;
}
Slots[i].next = NULL_ID;
head = 0;
}

int GetSlot(void)
{
int oldhead;
int next;

do {
oldhead = head;
if (oldhead == NULL_ID)
{
return NULL_ID;
}
else
{
next = Slots[oldhead].next;
}
} while(!CAS(&head, oldhead, next));

return oldhead;
}

void PutSlot(int id)
{
int oldhead;

do {
oldhead = head;
Slots[id].next = oldhead;
} while(!CAS(&head, oldhead, id));
}

DATA* DecodeSlotID(int id)
{
return &Slots[id].data;
}


The idea of this algorithm is to use CAS to check when modify the head, if it's still the old value. This is commonly called read-modify-write. But this arises the well-known ABA problem.
ABA Problem

The above algorithm has a subtle problem, it assumes that if the id didn't change, then the list remains the same also. But it's very common to happen that other tasks takes head and head.next and then returns the head, now the head.next actually changed. This problem is known as ABA problem2.

There are several ways to solve it. Valois gave a methodology of memory management which tracks use count of pointers3. This way assures that a pointer possessing by some one will never be allocated until no one has a copy of the pointer, thus avoiding the ABA problem to happen. Michael and Scott publishes their fixes on Valois's memory management mistakes4.

Another way is to use pointer tag, which adds an extra "tag" bits to the pointer. The "tag" usually increments itself on every copy operation. Because of this, the next compare-and-swap will fail, even if the addresses are the same, because the tag bits will not match. This does not completely solve the problem, as the tag bits will eventually wrap around, but helps to avoid it.
Use Pointer Tag to Avoid ABA Problem


typedef union
{
/** to write */
uint_t Value;

/** to write */
struct
{
/** to write */
uhalfint_t Counter;
/** to write */
uhalfint_t Index;
} Data;
} SLOTID;


Type "uhalfint_t" is half length of uint_t, uint_t is unsigned integer type. The "Counter" here is the "tag" of the pointer.

Now the algorithm looks like this:


typedef struct
{
SLOTID next;
DATA data;
}Node;

Node Slots[MAX_SLOT_NUM];

SLOTID head;

static inline
SLOTID NewSLOTID(uhalfint_t index)
{
SLOTID id;

id.Data.Counter = 0;
id.Data.Index = index;

return id;
}

static inline
bool SLOTID_CAS(SLOTID *id, SLOTID oldid, SLOTID newid)
{
/* Increae the counter to avoid ABA problem */
++newid.Data.Counter;

return CAS(&id->Value, oldid.Value, oldid.Value);
}

void InitSlots(void)
{
int i;

for (i = 0; i < MAX_SLOT_NUM - 1; ++i)
{
Slots[i].next = NewSLOTID(i+1);
}
Slots[i].next = NewSLOTID(NULL_ID);
head = NewSLOTID(0);
}

SLOTID GetSlot(void)
{
SLOTID oldhead;
SLOTID next;

do {
oldhead = head;
if (oldhead == NULL_ID)
{
return NULL_ID;
}
else
{
next = Slots[oldhead.Data.Index].next;
}
} while(!SLOTID_CAS(&head, oldhead, next));

return oldhead;
}

void PutSlot(SLOTID id)
{
SLOTID oldhead;

do {
oldhead = head;
Slots[id.Data.Index].next = oldhead;
} while(!SLOTID_CAS(&head, oldhead, id));
}

DATA* DecodeSlotID(SLOTID id)
{
return &Slots[id.Data.Index].data;
}


The key algorithm is the SLOTID_CAS operation: every time it's going to change the SLOTID, it uses SLOTID_CAS, which increase the Counter then CAS. This makes the ABA like ABA'. The index can be the same, but the counter is unlikely the same, the wider range of Counter is, the lesser possibility ABA will happen. On a 32-bit machine, range of Counter is [0..2^16].
Wider CAS

The problem of packing counter and index into a integer is obvious: the limitation number of array elements on a 32-bit machine is 2^16. And the counter limitation is 2^16, after that it wraps to 0. 2^16 is not a big enough value to soothe the skeptics, so on some architecture wider CAS is provided. Wider CAS is different from Multi CAS, the former can CAS on an adjacent memory fields thus be called wider, but the latter can CAS on unrelated memory fields.

On later inter x86 processor, it provides an instruction called CMPXCHG8B, which compare-and-swap-8-bytes5. By this instruction, we can operate on a normal memory pointer instead of a memory pointer, and its "tag" which has a range as large as 2^32.

CAS2 is a realistic existence of Multi CAS, but only on some Motorola 680x0 processors.
Load-Link and Store-Conditional(LL/SC)

In computer science, load-link (LL, also known as "load-linked" or "load and reserve") and store-conditional (SC) are a pair of instructions that together implement a lock-free atomic read-modify-write operation.

Load-link returns the current value of a memory location. A subsequent store-conditional to the same memory location will store a new value only if no updates have occurred to that location since the load-link. If any updates have occurred, the store-conditional is guaranteed to fail, even if the value read by the load-link has since been restored. As such, an LL/SC pair is stronger than a read followed by a compare-and-swap (CAS), which will not detect updates if the old value has been restored.6

LL/SC can finally make skeptics happy, it doesn't just make ABA problem look like ABA', but solve it in another say. The algorithm with LL/SC can be:


typedef struct
{
int next;
DATA data;
}Node;

Node Slots[MAX_SLOT_NUM];

int head;

void InitSlots(void)
{
int i;

for (i = 0; i < MAX_SLOT_NUM - 1; ++i)
{
Slots[i].next = i+1;
}
Slots[i].next = NULL_ID;
head = 0;
}

int GetSlot(void)
{
int oldhead;
int next;

do {
oldhead = LL(&head);
if (oldhead == NULL_ID)
{
return NULL_ID;
}
else
{
next = Slots[oldhead].next;
}
} while(!SC(&head, next));

return oldhead;
}

void PutSlot(int id)
{
int oldhead;

do {
oldhead = LL(&head);
Slots[id].next = oldhead;
} while(!SC(&head, id));
}

DATA* DecodeSlotID(int id)
{
return &Slots[id].data;
}

...全文
183 5 打赏 收藏 转发到动态 举报
写回复
用AI写文章
5 条回复
切换为时间正序
请发表友善的回复…
发表回复
hmsuccess 2008-12-05
  • 打赏
  • 举报
回复
mark
hellwolf 2008-12-05
  • 打赏
  • 举报
回复
Load-Link and Store-Conditional(LL/SC)

In computer science, load-link (LL, also known as "load-linked" or "load and reserve") and store-conditional (SC) are a pair of instructions that together implement a lock-free atomic read-modify-write operation.

Load-link returns the current value of a memory location. A subsequent store-conditional to the same memory location will store a new value only if no updates have occurred to that location since the load-link. If any updates have occurred, the store-conditional is guaranteed to fail, even if the value read by the load-link has since been restored. As such, an LL/SC pair is stronger than a read followed by a compare-and-swap (CAS), which will not detect updates if the old value has been restored.6

LL/SC can finally make skeptics happy, it doesn't just make ABA problem look like ABA', but solve it in another say. The algorithm with LL/SC can be:


typedef struct
{
int next;
DATA data;
}Node;

Node Slots[MAX_SLOT_NUM];

int head;

void InitSlots(void)
{
int i;

for (i = 0; i < MAX_SLOT_NUM - 1; ++i)
{
Slots[i].next = i+1;
}
Slots[i].next = NULL_ID;
head = 0;
}

int GetSlot(void)
{
int oldhead;
int next;

do {
oldhead = LL(&head);
if (oldhead == NULL_ID)
{
return NULL_ID;
}
else
{
next = Slots[oldhead].next;
}
} while(!SC(&head, next));

return oldhead;
}

void PutSlot(int id)
{
int oldhead;

do {
oldhead = LL(&head);
Slots[id].next = oldhead;
} while(!SC(&head, id));
}

DATA* DecodeSlotID(int id)
{
return &Slots[id].data;
}


To use the above algorithm, the users have to also use LL/SC to load and store the slot id because of the ABA problem. But since realistic LL/SC implementations all have limitations, actually LL/SC version algorithm is more difficult to use than the CAS version.
Realistic Limitations

Real implementations of LL/SC can be found on Alpha, PowerPC7, MIPS, ARMv6(or above). But they're all Weak LL/SC, SC can fail even if there's no update between LL and corresponding SC, for example:

* The CPU can only reserves a memory region at a time.
* A context switching between them can cause the SC fail.

The first limitation can cause a lot of ideal algorithms based on LL/SC fail. So CAS version algorithm is still preferable.
Conclusion

Wait-Free Fix-size Slots Management is a simple version of non-blocking memory management implementation, it only manage fix-size slots. But it already make a lot of Non-Blocking algorithm common problems and tricks emerging. Later we'll see a implementation of Wait-Free Fifo Queue based on this Slots Management.

1. Compare-and-swap from Wikipedia http://en.wikipedia.org/wiki/Compare-and-swap

2. ABA problem from Wikipedia http://en.wikipedia.org/wiki/ABA_problem

3. J.D. Valois, Lock-Free Data Structures PhD Thesis, Rensselaer Polytechnic Institute, Department of Computer Science, 1995.

4. Maged M. Michael, Michael L. Scott Correction of a Memory Management Method for Lock-Free Data Structures, Department of Computer Science University of Rochester, 1995.

5. "Intel 64 and IA-32 Architectures Software Developer's Manual Volume 2A: Instruction Set Reference, A-M" (PDF). Retrieved on 2007-12-15.

6. Load-Link/Store-Conditional from Wikipedia http://en.wikipedia.org/wiki/Load-Link/Store-Conditional

7. "Power ISA Version 2.05". Power.org (2007-10-23). Retrieved on 2007-12-18.
hellwolf 2008-12-05
  • 打赏
  • 举报
回复
update

原文: http://blog.chinaunix.net/u/8057/showart_1673353.html

Non-Blocking Algorithm - Fix-size Slots Management

Author: ZC Miao <hellwolf d o t misty a,t gmail d o t com>

Revision:

* 2008-12-04

Add revision information.

Realistic Limitations for LL/SC.
* 2008-11-30

First revision.

Problem

Provide a non-blocking algorithm that maintains an array of fixed-size slots with the following interface:

* void InitSlots(void)

Slots initialization.
* SLOTID GetSlot(void)

Exclusively get an id of one free slot, and prevent other user to get this slot.
* void PutSlot(SLOTID id)

Reclaim a slot returned by GetSlot, and make it free to get again.
* DATA* DecodeSlotID(SLOTID id)

Decode a id of slot, and return the memory address of that slot.

Primitive CAS

Atomic operation CAS(compare-and-swap) atomically compares the contents of a memory location to a given value and, if they are the same, modifies the contents of that memory location to a given new value1. The following algorithm assume that CAS is available on the target hardware.
First Algorithm(with mistakes)


typedef struct
{
int next;
DATA data;
}Node;

Node Slots[MAX_SLOT_NUM];

int head;

void InitSlots(void)
{
int i;

for (i = 0; i < MAX_SLOT_NUM - 1; ++i)
{
Slots[i].next = i+1;
}
Slots[i].next = NULL_ID;
head = 0;
}

int GetSlot(void)
{
int oldhead;
int next;

do {
oldhead = head;
if (oldhead == NULL_ID)
{
return NULL_ID;
}
else
{
next = Slots[oldhead].next;
}
} while(!CAS(&head, oldhead, next));

return oldhead;
}

void PutSlot(int id)
{
int oldhead;

do {
oldhead = head;
Slots[id].next = oldhead;
} while(!CAS(&head, oldhead, id));
}

DATA* DecodeSlotID(int id)
{
return &Slots[id].data;
}


The idea of this algorithm is to use CAS to check when modify the head, if it's still the old value. This is commonly called read-modify-write. But this arises the well-known ABA problem.
ABA Problem

The above algorithm has a subtle problem, it assumes that if the id didn't change, then the list remains the same also. But it's very common to happen that other tasks takes head and head.next and then returns the head, now the head.next actually changed. This problem is known as ABA problem2.

There are several ways to solve it. Valois gave a methodology of memory management which tracks use count of pointers3. This way assures that a pointer possessing by some one will never be allocated until no one has a copy of the pointer, thus avoiding the ABA problem to happen. Michael and Scott publishes their fixes on Valois's memory management mistakes4.

Another way is to use pointer tag, which adds an extra "tag" bits to the pointer. The "tag" usually increments itself on every copy operation. Because of this, the next compare-and-swap will fail, even if the addresses are the same, because the tag bits will not match. This does not completely solve the problem, as the tag bits will eventually wrap around, but helps to avoid it.
Use Pointer Tag to Avoid ABA Problem


typedef union
{
/** to write */
uint_t Value;

/** to write */
struct
{
/** to write */
uhalfint_t Counter;
/** to write */
uhalfint_t Index;
} Data;
} SLOTID;


Type "uhalfint_t" is half length of uint_t, uint_t is unsigned integer type. The "Counter" here is the "tag" of the pointer.

Now the algorithm looks like this:


typedef struct
{
SLOTID next;
DATA data;
}Node;

Node Slots[MAX_SLOT_NUM];

SLOTID head;

static inline
SLOTID NewSLOTID(uhalfint_t index)
{
SLOTID id;

id.Data.Counter = 0;
id.Data.Index = index;

return id;
}

static inline
bool SLOTID_CAS(SLOTID *id, SLOTID oldid, SLOTID newid)
{
/* Increae the counter to avoid ABA problem */
++newid.Data.Counter;

return CAS(&id->Value, oldid.Value, oldid.Value);
}

void InitSlots(void)
{
int i;

for (i = 0; i < MAX_SLOT_NUM - 1; ++i)
{
Slots[i].next = NewSLOTID(i+1);
}
Slots[i].next = NewSLOTID(NULL_ID);
head = NewSLOTID(0);
}

SLOTID GetSlot(void)
{
SLOTID oldhead;
SLOTID next;

do {
oldhead = head;
if (oldhead == NULL_ID)
{
return NULL_ID;
}
else
{
next = Slots[oldhead.Data.Index].next;
}
} while(!SLOTID_CAS(&head, oldhead, next));

return oldhead;
}

void PutSlot(SLOTID id)
{
SLOTID oldhead;

do {
oldhead = head;
Slots[id.Data.Index].next = oldhead;
} while(!SLOTID_CAS(&head, oldhead, id));
}

DATA* DecodeSlotID(SLOTID id)
{
return &Slots[id.Data.Index].data;
}


The key algorithm is the SLOTID_CAS operation: every time it's going to change the SLOTID, it uses SLOTID_CAS, which increase the Counter then CAS. This makes the ABA like ABA'. The index can be the same, but the counter is unlikely the same, the wider range of Counter is, the lesser possibility ABA will happen. On a 32-bit machine, range of Counter is [0..2^16].
Wider CAS

The problem of packing counter and index into a integer is obvious: the limitation number of array elements on a 32-bit machine is 2^16. And the counter limitation is 2^16, after that it wraps to 0. 2^16 is not a big enough value to soothe the skeptics, so on some architecture wider CAS is provided. Wider CAS is different from Multi CAS, the former can CAS on an adjacent memory fields thus be called wider, but the latter can CAS on unrelated memory fields.

On later inter x86 processor, it provides an instruction called CMPXCHG8B, which compare-and-swap-8-bytes5. By this instruction, we can operate on a normal memory pointer instead of a memory pointer, and its "tag" which has a range as large as 2^32.

CAS2 is a realistic existence of Multi CAS, but only on some Motorola 680x0 processors.
hityct1 2008-12-01
  • 打赏
  • 举报
回复
mark
hellwolf 2008-11-30
  • 打赏
  • 举报
回复


Real implementations of LL/SC can be found on Alpha, PowerPC, MIPS, ARMv6(or above). But they're all Weak LL/SC, SC can fail even if there's no update between LL and corresponding SC, for example that can happen if there's context switching between them. For the above algorithm this weakness doesn't break the algorithm, fortunately.
Conclusion

Wait-Free Fix-size Slots Management is a simple version of non-blocking memory management implementation, it only manage fix-size slots. But it already make a lot of Non-Blocking algorithm common problems and tricks emerging. Later we'll see a implementation of Wait-Free Fifo Queue based on this Slots Management.

1. Compare-and-swap from Wikipedia http://en.wikipedia.org/wiki/Compare-and-swap

2. ABA problem from Wikipedia http://en.wikipedia.org/wiki/ABA_problem

3. J.D. Valois, Lock-Free Data Structures PhD Thesis, Rensselaer Polytechnic Institute, Department of Computer Science, 1995.

4. Maged M. Michael, Michael L. Scott Correction of a Memory Management Method for Lock-Free Data Structures, Department of Computer Science University of Rochester, 1995.

5. "Intel 64 and IA-32 Architectures Software Developer's Manual Volume 2A: Instruction Set Reference, A-M" (PDF). Retrieved on 2007-12-15.

6. Load-Link/Store-Conditional from Wikipedia http://en.wikipedia.org/wiki/Load-Link/Store-Conditional

33,008

社区成员

发帖
与我相关
我的任务
社区描述
数据结构与算法相关内容讨论专区
社区管理员
  • 数据结构与算法社区
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告
暂无公告

试试用AI创作助手写篇文章吧