Skip to content

Java Atomic Operations Assignment Notebook

Much has already been written about atomic operations on the web, usually with a focus on atomic read-modify-write (RMW) operations. However, those aren’t the only kinds of atomic operations. There are also atomic loads and stores, which are equally important. In this post, I’ll compare atomic loads and stores to their non-atomic counterparts at both the processor level and the C/C++ language level. Along the way, we’ll clarify the C++11 concept of a “data race”.

An operation acting on shared memory is atomic if it completes in a single step relative to other threads. When an atomic store is performed on a shared variable, no other thread can observe the modification half-complete. When an atomic load is performed on a shared variable, it reads the entire value as it appeared at a single moment in time. Non-atomic loads and stores do not make those guarantees.

Without those guarantees, lock-free programming would be impossible, since you could never let different threads manipulate a shared variable at the same time. We can formulate it as a rule:

Any time two threads operate on a shared variable concurrently, and one of those operations performs a write, both threads must use atomic operations.

If you violate this rule, and either thread uses a non-atomic operation, you’ll have what the C++11 standard refers to as a data race (not to be confused with Java’s concept of a data race, which is different, or the more general race condition). The C++11 standard doesn’t tell you why data races are bad; only that if you have one, “undefined behavior” will result (§1.10.21). The real reason why such data races are bad is actually quite simple: They result in torn reads and torn writes.

A memory operation can be non-atomic because it uses multiple CPU instructions, non-atomic even when using a single CPU instruction, or non-atomic because you’re writing portable code and you simply can’t make the assumption. Let’s look at a few examples.

Non-Atomic Due to Multiple CPU Instructions

Suppose you have a 64-bit global variable, initially zero.

At some point, you assign a 64-bit value to this variable.

When you compile this function for 32-bit x86 using GCC, it generates the following machine code.

As you can see, the compiler implemented the 64-bit assignment using two separate machine instructions. The first instruction sets the lower 32 bits to , and the second sets the upper 32 bits to . Clearly, this assignment operation is not atomic. If is accessed concurrently by different threads, several things can now go wrong:

  • If a thread calling is preempted between the two machine instructions, it will leave the value of in memory – a torn write. At this point, if another thread reads , it will receive this completely bogus value which nobody intended to store.
  • Even worse, if a thread is preempted between the two instructions, and another thread modifies before the first thread resumes, it will result in a permanently torn write: the upper 32 bits from one thread, the lower 32 bits from another.
  • On multicore devices, it isn’t even necessary to preempt one of the threads to have a torn write. When a thread calls , any thread executing on a different core could read at a moment when only half the change is visible.

Reading concurrently from brings its own set of problems:

Here too, the compiler has implemented the load operation using two machine instructions: The first reads the lower 32 bits into , and the second reads the upper 32 bits into . In this case, if a concurrent store to becomes visible between the two instructions, it will result in a torn read – even if the concurrent store was atomic.

These problems are not just theoretical. Mintomic’s test suite includes a test case called , in which one thread stores a bunch of 64-bit values to a single variable using a plain assignment operator, while another thread repeatedly performs a plain load from the same variable, validating each result. On a multicore x86, this test fails consistently, as expected.

Non-Atomic CPU Instructions

A memory operation can be non-atomic even when performed by a single CPU instruction. For example, the ARMv7 instruction set includes the instruction, which stores the contents of two 32-bit source registers to a single 64-bit value in memory.

On some ARMv7 processors, this instruction is not atomic. When the processor sees this instruction, it actually performs two separate 32-bit stores under the hood (§A3.5.3). Once again, another thread running on a separate core has the possibility of observing a torn write. Interestingly, a torn write is even possible on a single-core device: A system interrupt – say, for a scheduled thread context switch – can actually occur between the two internal 32-bit stores! In this case, when the thread resumes from the interrupt, it will restart the instruction all over again.

As another example, it’s well-known that on x86, a 32-bit instruction is atomic if the memory operand is naturally aligned, but non-atomic otherwise. In other words, atomicity is only guaranteed when the 32-bit integer is located at an address which is an exact multiple of 4. Mintomic comes with another test case, , which verifies this guarantee. As it’s written, this test always succeeds on x86, but if you modify the test to force to certain unaligned addresses, it will fail. On my Core 2 Quad Q6600, the test fails when crosses a cache line boundary:

Those are enough processor-specific details for now. Let’s look at atomicity at the C/C++ language level.

All C/C++ Operations Are Presumed Non-Atomic

In C and C++, every operation is presumed non-atomic unless otherwise specified by the compiler or hardware vendor – even plain 32-bit integer assignment.

The language standards have nothing to say about atomicity in this case. Maybe integer assignment is atomic, maybe it isn’t. Since non-atomic operations don’t make any guarantees, plain integer assignment in C is non-atomic by definition.

In practice, we usually know more about our target platforms than that. For example, it’s common knowledge that on all modern x86, x64, Itanium, SPARC, ARM and PowerPC processors, plain 32-bit integer assignment is atomic as long as the target variable is naturally aligned. You can verify it by consulting your processor manual and/or compiler documentation. In the games industry, I can tell you that a lot of 32-bit integer assignments rely on this particular guarantee.

Nonetheless, when writing truly portable C and C++, there’s a long-standing tradition of pretending that we don’t know anything more than what the language standards tell us. Portable C and C++ is designed to run on every possible computing device past, present and imaginary. Personally, I like to imagine a machine where memory can only be changed by mixing it up first:

On such a machine, you definitely wouldn’t want to perform a concurrent read at the same time as a plain assignment; you could end up reading a completely random value.

In C++11, there is finally a way to perform truly portable atomic loads and stores: the C++11 atomic library. Atomic loads and stores performed using the C++11 atomic library would even work on the imaginary computer above – even if it means the C++11 atomic library must secretly lock a mutex to make each operation atomic. There’s also the Mintomic library which I released last month, which doesn’t support as many platforms, but works on several older compilers, is hand-optimized and is guaranteed to be lock-free.

Relaxed Atomic Operations

Let’s return to the original example from earlier in this post. We’ll rewrite it using Mintomic so that all operations are performed atomically on every platform Mintomic supports. First, we must declare as one of Mintomic’s atomic data types.

The type guarantees correct memory alignment for atomic access on each platform. This is important because, for example, the GCC 4.2 compiler for ARM bundled with Xcode 3.2.5 doesn’t guarantee that plain will be 8-byte aligned.

In , instead of performing a plain, non-atomic assignment, we must call .

Similarly, in , we call .

Using C++11’s terminology, these functions are now data race-free. When executing concurrently, there is absolutely no possibility of a torn read or write, whether the code runs on ARMv6/ARMv7 (Thumb or ARM mode), x86, x64 or PowerPC. If you’re curious how and actually work, both functions expand to an inline instruction on x86; for other platforms, consult Mintomic’s implementation.

Here’s the exact same thing written in C++11 instead:

You’ll notice that both the Mintomic and C++11 examples use relaxed atomics, as evidenced by the suffix on various identifiers. The suffix is a reminder that few guarantees are made about memory ordering.

In particular, it is still legal for the memory effects of a relaxed atomic operation to be reordered with respect to instructions which follow or precede it in program order, either due to compiler reordering or memory reordering on the processor itself. The compiler could even perform optimizations on redundant relaxed atomic operations, just as with non-atomic operations. In all cases, the operation remains atomic.

When manipulating shared memory concurrently, I think it’s good practice to always use Mintomic or C++11 atomic library functions, even in cases where you know that a plain load or store would already be atomic on your target platform. An atomic library function serves as a reminder that elsewhere, the variable is the target of concurrent data access.

Hopefully, it’s now a bit more clear why the World’s Simplest Lock-Free Hash Table uses Mintomic library functions to manipulate shared memory concurrently from different threads.

uint64_t sharedValue = 0;
void storeValue() { sharedValue = 0x100000002; }
#pragma pack(2) MINT_DECL_ALIGNED(staticstruct, 64) { char padding[62]; mint_atomic32_t sharedInt; } g_wrapper;
uint32_t foo = 0; void storeFoo() { foo = 0x80286; }
#include<mintomic/mintomic.h>mint_atomic64_t sharedValue = { 0 };
void storeValue() { mint_store_64_relaxed(&sharedValue, 0x100000002); }
uint64_t loadValue() { returnmint_load_64_relaxed(&sharedValue); }
#include<atomic> std::atomic<uint64_t> sharedValue(0); void storeValue() { sharedValue.store(0x100000002, std::memory_order_relaxed); } uint64_t loadValue() { return sharedValue.load(std::memory_order_relaxed); }

« The World's Simplest Lock-Free Hash TableThe Happens-Before Relation »

Compound operations are operations that consist of more than one discrete operation. Expressions that include postfix or prefix increment (), postfix or prefix decrement (), or compound assignment operators always result in compound operations. Compound assignment expressions use operators such as , , , , , , , , and [JLS 2015]. Compound operations on shared variables must be performed atomically to prevent data races and race conditions.

For information about the atomicity of a grouping of calls to independently atomic methods that belong to thread-safe classes, see VNA03-J. Do not assume that a group of calls to independently atomic methods is atomic.

The Java Language Specification also permits reads and writes of 64-bit values to be non-atomic (see rule VNA05-J. Ensure atomicity when reading and writing 64-bit values).

Noncompliant Code Example (Logical Negation)

This noncompliant code example declares a shared variable and provides a method that negates the current value of :

Execution of this code may result in a data race because the value of is read, negated, and written back.

Consider, for example, two threads that call . The expected effect of toggling twice is that it is restored to its original value. However, the following scenario leaves in the incorrect state:

Time

flag=

Thread

Action

1

true

t1

Reads the current value of , true, into a temporary variable

2

true

t2

Reads the current value of , (still) true, into a temporary variable

3

true

t1

Toggles the temporary variable to false

4

true

t2

Toggles the temporary variable to false

5

false

t1

Writes the temporary variable's value to

6

false

t2

Writes the temporary variable's value to

As a result, the effect of the call by t2 is not reflected in ; the program behaves as if was called only once, not twice.

Noncompliant Code Example (Bitwise Negation)

The method may also use the compound assignment operator to negate the current value of :

This code is also not thread-safe. A data race exists because is a non-atomic compound operation.

Noncompliant Code Example (Volatile)

Declaring volatile also fails to solve the problem:

This code remains unsuitable for multithreaded use because declaring a variable volatile fails to guarantee the atomicity of compound operations on the variable.

Compliant Solution (Synchronization)

This compliant solution declares both the and methods as synchronized:

This solution guards reads and writes to the field with a lock on the instance, that is, . Furthermore, synchronization ensures that changes are visible to all threads. Now, only two execution orders are possible, one of which is shown in the following scenario:

Time

flag=

Thread

Action

1

true

t1

Reads the current value of , true, into a temporary variable

2

true

t1

Toggles the temporary variable to false

3

false

t1

Writes the temporary variable's value to

4

false

t2

Reads the current value of , false, into a temporary variable

5

false

t2

Toggles the temporary variable to true

6

true

t2

Writes the temporary variable's value to

The second execution order involves the same operations, but t2 starts and finishes before t1.
Compliance with LCK00-J. Use private final lock objects to synchronize classes that may interact with untrusted code can reduce the likelihood of misuse by ensuring that untrusted callers cannot access the lock object.

Compliant Solution (Volatile-Read, Synchronized-Write)

In this compliant solution, the method is not synchronized, and is declared as volatile. This solution is compliant because the read of in the method is an atomic operation and the volatile qualification assures visibility. The method still requires synchronization because it performs a non-atomic operation.

This approach must not be used for getter methods that perform any additional operations other than returning the value of a volatile field without use of synchronization. Unless read performance is critical, this technique may lack significant advantages over synchronization [Goetz 2006].

Compliant Solution (Read-Write Lock)

This compliant solution uses a read-write lock to ensure atomicity and visibility:

Read-write locks allow shared state to be accessed by multiple readers or a single writer but never both. According to Goetz [Goetz 2006]:

In practice, read-write locks can improve performance for frequently accessed read-mostly data structures on multiprocessor systems; under other conditions they perform slightly worse than exclusive locks due to their greater complexity.

Profiling the application can determine the suitability of read-write locks.

Compliant Solution ()

This compliant solution declares to be of type :

The variable is updated using the method of the class. All updates are visible to other threads.

Noncompliant Code Example (Addition of Primitives)

In this noncompliant code example, multiple threads can invoke the method to set the and fields. Because this class fails to test for integer overflow, users of the class must ensure that the arguments to the method can be added without overflow (see NUM00-J. Detect or prevent integer overflow for more information).

The method contains a race condition. For example, when and currently have the values and , respectively, and one thread calls while another calls , the method might return either or , or it might overflow. Overflow will occur when the first thread reads and after the second thread has set the value of to but before it has set the value of to .

Note that declaring the variables as volatile fails to resolve the issue because these compound operations involve reads and writes of multiple variables.

Noncompliant Code Example (Addition of Atomic Integers)

In this noncompliant code example, and are replaced with atomic integers:

The simple replacement of the two fields with atomic integers fails to eliminate the race condition because the compound operation is still non-atomic.

Compliant Solution (Addition)

This compliant solution synchronizes the and methods to ensure atomicity:

The operations within the synchronized methods are now atomic with respect to other synchronized methods that lock on that object's monitor (that is, its intrinsic lock). It is now possible, for example, to add overflow checking to the synchronized method without introducing the possibility of a race condition.

Risk Assessment

When operations on shared variables are not atomic, unexpected results can be produced. For example, information can be disclosed inadvertently because one user can receive information about other users.

Rule

Severity

Likelihood

Remediation Cost

Priority

Level

VNA02-J

Medium

Probable

Medium

P8

L2

Automated Detection

Some available static analysis tools can detect the instances of non-atomic update of a concurrently shared value. The result of the update is determined by the interleaving of thread execution. These tools can detect the instances where thread-shared data is accessed without holding an appropriate lock, possibly causing a race condition.

ToolVersionCheckerDescription
CodeSonar4.2FB.MT_CORRECTNESS.IS2_INCONSISTENT_SYNC
FB.MT_CORRECTNESS.IS_FIELD_NOT_GUARDED
FB.MT_CORRECTNESS.STCAL_INVOKE_ON_STATIC_CALENDAR_INSTANCE
FB.MT_CORRECTNESS.STCAL_INVOKE_ON_STATIC_DATE_FORMAT_INSTANCE
FB.MT_CORRECTNESS.STCAL_STATIC_CALENDAR_INSTANCE
FB.MT_CORRECTNESS.STCAL_STATIC_SIMPLE_DATE_FORMAT_INSTANCE
Inconsistent synchronization
Field not guarded against concurrent access
Call to static Calendar
Call to static DateFormat
Static Calendar field
Static DateFormat
Coverity7.5

GUARDED_BY_VIOLATION
INDIRECT_GUARDED_BY_VIOLATION
NON_STATIC_GUARDING_STATIC
NON_STATIC_GUARDING_STATIC
SERVLET_ATOMICITY
FB.IS2_INCONSISTENT_SYNC
FB.IS_FIELD_NOT_GUARDED
FB.IS_INCONSISTENT_SYNC
FB.STCAL_INVOKE_ON_STATIC_ CALENDAR_INSTANCE
FB.STCAL_INVOKE_ON_STATIC_ DATE_FORMAT_INSTANCE
FB.STCAL_STATIC_CALENDAR_ INSTANCE
FB.STCAL_STATIC_SIMPLE_DATE_ FORMAT_INSTANCE

Implemented
Parasoft Jtest 10.3 TRS.SSUG, TRS.MRAVImplemented
ThreadSafe1.3

CCE_SL_INCONSISTENT
CCE_CC_CALLBACK_ACCESS
CCE_SL_MIXED
CCE_SL_INCONSISTENT_COL
CCE_SL_MIXED_COL
CCE_CC_UNSAFE_CONTENT

Implemented

Related Guidelines

Bibliography


final class Flag { private boolean flag = true; public void toggle() { // Unsafe flag = !flag; } public boolean getFlag() { // Unsafe return flag; } }
final class Flag { private boolean flag = true; public void toggle() { // Unsafe flag ^= true; // Same as flag = !flag; } public boolean getFlag() { // Unsafe return flag; } }
final class Flag { private volatile boolean flag = true; public void toggle() { // Unsafe flag ^= true; } public boolean getFlag() { // Safe return flag; } }
final class Flag { private boolean flag = true; public synchronized void toggle() { flag ^= true; // Same as flag = !flag; } public synchronized boolean getFlag() { return flag; } }
final class Flag { private volatile boolean flag = true; public synchronized void toggle() { flag ^= true; // Same as flag = !flag; } public boolean getFlag() { return flag; } }
final class Flag { private boolean flag = true; private final ReadWriteLock lock = new ReentrantReadWriteLock(); private final Lock readLock = lock.readLock(); private final Lock writeLock = lock.writeLock(); public void toggle() { writeLock.lock(); try { flag ^= true; // Same as flag = !flag; } finally { writeLock.unlock(); } } public boolean getFlag() { readLock.lock(); try { return flag; } finally { readLock.unlock(); } } }
import java.util.concurrent.atomic.AtomicBoolean; final class Flag { private AtomicBoolean flag = new AtomicBoolean(true); public void toggle() { boolean temp; do { temp = flag.get(); } while (!flag.compareAndSet(temp, !temp)); } public AtomicBoolean getFlag() { return flag; } }
final class Adder { private int a; private int b; public int getSum() { return a + b; } public void setValues(int a, int b) { this.a = a; this.b = b; } }
final class Adder { private final AtomicInteger a = new AtomicInteger(); private final AtomicInteger b = new AtomicInteger(); public int getSum() { return a.get() + b.get(); } public void setValues(int a, int b) { this.a.set(a); this.b.set(b); } }
final class Adder { private int a; private int b; public synchronized int getSum() { // Check for overflow return a + b; } public synchronized void setValues(int a, int b) { this.a = a; this.b = b; } }