file-243119101

C++11 Tutorial: Explaining the Ever-Elusive Lvalues and Rvalues

lvalues and rvalues

Every C++ programmer is familiar with the terms lvalue and rvalue. It’s no surprise, since the C++ standard uses them “all over”, as do many textbooks. But what do they mean? Are they still relevant now that C++11 has five value categories? It’s about time to clear up the mystery and get rid of the myths.

Lvalues and rvalues were introduced in a seminal article by Strachey et al (1963) that presented CPL. A CPL expression appearing on the left hand side of an assignment expression is evaluated as a memory address into which the right-hand side value is written. Later, left-hand expressions and right-hand expressions became lvalues and rvalues, respectively.

One of CPL’s descendants, B, was the language on which Dennis Ritchie based C. Ritchie borrowed the term lvalue to refer to a memory region to which a C program can write the right hand side value of an assignment expression. He left out rvalues, feeling that lvalue and “not lvalue” would suffice.

Later, rvalue made it into K&R C and ISO C++. C++11 extended the notion of rvalues even further by letting you bind rvalue references to them. Although nowadays lvalue and rvalues have slightly different meanings from their original CPL meanings, they are encoded “in the genes of C++,” to quote Bjarne Stroustrup. Therefore, understanding what they mean and how the addition of move semantics affected them can help you understand certain C++ features and idioms better –– and write better code.

Right from Wrong

Before attempting to define lvalues, let’s look at some examples:

int x=9;
std::string s;
int *p=0;
int &ri=x;

The identifiers x, s, p and ri are all lvalues. Indeed, they can appear on the left-hand side of an assignment expression and therefore seem to justify the CPL generalization: “Anything that can appear on the left-hand side of an assignment expression is an lvalue.” However, counter-examples are readily available:

void func(const int * pi, const int & ri) {
*pi=7;//compilation error, *pi is const
ri=8; //compilation error, ri is const

*pi and ri are const lvalues. Therefore, they cannot appear on the left-hand side of an expression after their initialization. This property doesn’t make them rvalues, though.

Now let’s look at some examples of rvalues. Literals such as 7, ‘a’, false and “hello world!” are instances of rvalues:

7==x;
char c= 'a';
bool clear=false;
const char s[]="hello world!";

Another subcategory of rvalues is temporaries. During the evaluation of an expression, an implementation may create a temporary object that stores an intermediary result:

int func(int y, int z, int w){
int x=0;
x=(y*z)+w;
return x ;
}

In this case, an implementation may create a temporary int to store the result of the sub-expression y*z. Conceptually, a temporary expires once its expression is fully evaluated. Put differently, it goes out of scope or gets destroyed upon reaching the nearest semicolon.

You can create temporaries explicitly, too. An expression in the form C(arguments) creates a temporary object of type C:

cout<<std::string ("test").size()<<endl;

Contrary to the CPL generalization, rvalues may appear on the left-hand side of an assignment expression in certain cases:

string ()={"hello"}; //creates a temp string

You’re probably more familiar with the shorter form of this idiom:

string("hello"); //creates a temp string

Clearly, the CPL generalization doesn’t really cut it for C++, although intuitively, it does capture the semantic difference between lvalues and rvalues.

So, what do lvalues and rvalues really stand for?

A Matter of Identity

An expression is a program statement that yields a value, for example a function call, a sizeof expression, an arithmetic expression, a logical expression, and so on. You can classify C++ expressions into two categories: values with identity and values with no identity. In this context, identity means a name, a pointer, or a reference that enable you to determine if two objects are the same, to change the state of an object, or copy it:

struct {int x; int y;} s; //no type name, value has id
string &rs= *new string;
const char *p= rs.data();

s, rs and p are identities of values. We can draw the following generalization: lvalues in C++03 are values that have identity. By contrast, rvalues in C++03 are values that have no identity. C++03 rvalues are accessible only inside the expression in which they are created:

int& func();
int func2();
func(); //this call is an lvalue
func2(); //this call is an rvalue
sizeof(short); //rvalue
new double; //new expressions are rvalues
S::S() {this->x=0; /*this is an rvalue expression*/}

A function’s name (not to be confused with a function call) is an rvalue expression that evaluates to the function’s address. Similarly, an array’s name is an rvalue expression that evaluates to the address of the first element of the array:

int& func3();
int& (*pf)()=func3;//func3 is an rvalue
int arr[2];
int* pi=arr;//arr is an rvalue

Because rvalues are short-lived, you have to capture them in lvalues if you wish to access them outside the context of their expression:

std::size_t n=sizeof(short);
double *pd=new double;
struct S
{
int x, y;
S() { S *p=this; p->x=0; p->y=0;}
};

Remember that any expression that evaluates to an lvalue reference (e.g., a function call, an overloaded assignment operator, etc.) is an lvalue. Any expression that returns an object by value is an rvalue.

Prior to C++11, identity (or the lack thereof) were the main criterion for distinguishing between lvalues and rvalues. However, the addition of rvalue references and move semantics to C++11 added a twist to the plot.

Binding Rvalues

C++11 lets you bind rvalue references to rvalues, effectively prolonging their lifetime as if they were lvalues:

//C++11
int && func2(){
return 17; //returns an rvalue
}
int main() {
int x=0;
int&& rr=func2();
cout<<rr<<endl;//ouput: 17
x=rr;// x=17 after the assignment
}

Using lvalue references where rvalue references are required is an error:

int& func2(){//compilation error: cannot bind
return 17;      //an lvalue reference to an rvalue
}

In C++03 copying the rvalue to an lvalue is the preferred choice (in some cases you can bind an lvalue reference to const to achieve a similar effect):

int func2(){ // an rvalue expression
return 17;
}
int m=func2(); // C++03-style copying

For fundamental types, the copy approach is reasonable. However, as far as class objects are concerned, spurious copying might incur performance overhead. Instead, C++11 encourages you to move objects. Moving means pilfering the resources of the source object, instead of copying it.

For further information about move semantics, read C++11 Tutorial: Introducing the Move Constructor and the Move Assignment Operator.

//C++11 move semantics in action
string source ("abc"), target;
target=std::move(source);  //pilfer source
//source no longer owns the resource
cout<<"source: "<<source<<endl; //source:
cout<<"target: "<<target<<endl; //target: abc

How does move semantics affect the semantics of lvalues and rvalues?

The Semantics of Change

In C++03, all you needed to know was whether a value had identity. In C++11 you also have to examine another property: movability. The combination of identity and movability (i and m, respectively, with a minus sign indicating negation) produces five meaningful value categories in C++11— “a whole type-zoo,” as one of my Twitter followers put it:

  • i-m: lvalues are non-movable objects with identity. These are classic C++03 lvalues from the pre-move era. The expression *p, where p is a pointer to an object is an lvalue. Similarly, dereferencing a pointer to a function is an lvalue.
  • im: xvalues (an “eXpiring” value) refers to an object near the end of its lifetime (before its resources are moved, for example). An xvalue is the result of certain kinds of expressions involving rvalue references, e.g., std::move(mystr);
  • i: glvalues, or generalized lvalues, are values with identity. These include lvalues and xvalues.
  • m: rvalues include xvalues, temporaries, and values that have no identity.
  • -im: prvalues, or pure rvalues, are rvalues that are not xvalues. Prvalues include literals and function calls whose return type is not a reference.

A detailed discussion about the new value categories is available in section 3.10 of the C++11 standard.

It has often been said that the original semantics of C++03 lvalues and rvalues remains unchanged in C++11. However, the C++11 taxonomy isn’t quite the same as that of C++03; In C++11, every expression belongs to exactly one of the value classifications lvalue, xvalue, or prvalue.

In Conclusion

Fifty years after their inception, lvalues and rvalues are still relevant not only in C++ but in many contemporary programming languages. C++11 changed the semantics of rvalues, introducing xvalues and prvalues. Conceptually, you can tame this type-zoo by grouping the five value categories into supersets, where glvalues include lvalues and xvalues, and rvalues include xvalues and prvalues. Still confused? It’s not you. It’s BCPL’s heritage that exhibits unusual vitality in a world that’s light years away from the punched card and 16 kilobyte RAM era.

About the author:

Danny Kalev is a certified system analyst and software engineer specializing in C++. Kalev has written several C++ textbooks and contributes C++ content regularly on various software developers’ sites. He was a member of the C++ standards committee and has a master’s degree in general linguistics. He’s now pursuing a PhD in linguistics. Follow him on Twitter.

See also:

Collaborator-Launch-Blog-Bottom

subscribe-1

  • http://www.chetu.com/ chetu

    I’ve read many articles about rvalues, lvalues, xvalues,
    glvalues, and prvalues and how to use them. But was often left feeling somewhat
    confused, now I finally have a better Idea of rvalues, and lvalues. Thanks for
    sharing such an informative piece.