Login

In

[To see links please register here]

, we talked about writing a pretty basic program, and we made a small hello world program to mimic the x86 variant and then expanded on it to make a hello world-print to file program for Linux. What we didn't talk about was how to plan your program, which is a key step in writing larger programs in assembly.

Flow Charting
We start off with the basics, flow charting. Many of you may already have some experience with this if you're a CS major, and most of those people will probably hate me for bringing it up and think it's a pointless step. It isn't, not for assembly at least. Flow charting is what allows you to determine what functional blocks you want your program to follow. Sure, you can write a program that has no subroutines and just jumps around all over one long stream of instructions, but that's not good code at all. What you need is a nicely structured program that strictly follows your algorithm, because let's be honest, nobody is using assembly just so they can make little hello world applications.

Let's start with the basics of flow charting, for those of you who already know this, you can skip to the next bit if you want. Here's an image that explains the basic flow charting symbols you'll need to know:
[Image: l4dtWEI.png]

So, let's go through what those mean.

We'll start with the start/end symbol. This one is pretty basic. Your program MUST have at least 2 of these (there is 1 exception, but we won't go there). The first one will mark the Entry Point of your program. This is important, because you can use the linker to actually define your entry point as something other than _start. For example, if you had a complex init routine (like libc does), you would define your entry point as that init routine and the routine would then jump to _start when it's finished. The same goes for repeating this symbol at the end. While it's sometimes implicit that the program will just end after a certain block, it's good documentation to have the end symbol stating that your program is done running. Especially if you are doing something with tail recursion.

Next, the arrows. These are by far the most important symbols in your chart, as they are what actually describe the flow control of your program. For simple programs, these may just be arrows pointing to the next block and that's it, but as things get more complicated, and when you start adding loop conditions and if-elses, these arrows become incredibly important. For those cases, you're going to draw an arrow from the decision block to all of the possible routes, and you will label your arrow with the value that corresponds with that path. Example: an if will have 2 arrows, one pointing the direction if it is true, one if it is false.

Third, the IO symbol. These are important to put into your flowchart, because they will signify when you should take input from the user or the operating system. You could replace these with subroutines and flowchart those in, but having their own symbol makes them extremely useful for keeping your charts short and to the point.

The process is a little bit of a confusing one. These aren't for subroutines, these are for doing specific tasks that you already know how to do. For example, if your program had to add 2 numbers, you would use IO blocks to read those numbers, then a process block would add them and set the variable. If you need to use a subroutine, then you'll have to chart that subroutine in (or use another block type, hint).

Decision blocks are your loop conditions, your if's, if-else's, etc. These ones are pretty simple to explain, but you do need to remember that you can't do 2 things in one block. For example, you can't put an if-elif-else in one decision block, you need two.

Now, there are a lot of other flow charting symbols available, some of these are incredibly useful for charting assembly (like the predefined and sub process symbols), so you may want to take a look at these as well (I won't describe them unless we need them).
[Image: LzquE5n.png]

For my examples, I'm using

[To see links please register here]

for flow charting.

Example Flow Charts
Here's a pretty easy one. The program is supposed to calculate the sum of 2 non-negative integers:
[Image: D6Y5s5x.png]

See how this one flows? You start at the top, then it asks for the user to enter a value for A. As long as A is <= 0, it will keep asking. After that it does the same thing with B, then it adds them together to make C, then displays it and the program exits. But even this is too basic to actually build an assembly program out of. We don't know how to read numbers from the user, or how to display them. In order to do that, we need to define a way to read input from the user character by character until we hit the enter key. Then we'll need to add a validation step if what the user entered is not an integer, we'll need to loop back and read the input again. Secondly, we'll need a way to convert those integers to and from text, and a way to display them on screen, since sys_write doesn't do any of that for us. This chart would be sufficient for a C program, it would look like this:

Hidden Content

You must

[To see links please register here]

or

[To see links please register here]

to view this content.

But since this is assembly, we don't have that sort of nicety, and that's what makes the flow charting important, without it we have no idea how to actually implement this program.
Now, I'm not going to make you go through the pain of flow charting atoi, itoa, scanf, and printf in each of your charts, we'll chart out the subroutines we need once, and then we'll use the predefined process boxes to implement them in our charts from then on. Remember, your flow charts don't have to be exact, and they shouldn't actually contain any code, but they should precisely describe how your program works to somebody who knows nothing about what you're doing. This brings us to our second key skill.

Pseudocoding
In order to create good flow charts, you will need to know how to break down your program into the various components that make up a good flow chart: processes, decisions, and interactions. These are basic, but broad categories that will encompass multiple flow chart symbols, but that's the reason we use them. When you have an idea in your head, you won't jump straight to a flow chart, you still have to plan out what exactly your program should do. You do this using pseudo code, meaning not (or false) code. This means explaining your code or program in English (or any spoken language for that matter), in terms that are easy to understand, but most importantly do not contain any code!

Let's take the above example, we'll start with the flowchart and work backwards from there. It goes like this:

Ask user for a number that is non-negative and non-zero
Ask user for a number that is non-negative and non-zero
Add up both of the numbers
Tell the user the result

This is extremely high level pseudo-code, and I'm sure it looks unfamiliar or even weird that I listed it out like that, but it's an important step. This is the type of pseudo-code you will see on a project spec or scope, and you need to know how to translate it into usable pseudo-code that you can build a flow chart out of. Let's go ahead and go one step lower, into low level pseudo code.
Remember, even though we're writing low level pseudo code, we're still not writing code. What we're writing is a nonstandard subjective description of the tasks we must do. We will take all 4 of the tasks in our high level description and break them down into their component tasks.

1. Ask user for a number that is non-negative and non-zero
In low level pseudo code, we need to be a little bit more specific about things. Where is the input coming from? Where is it going? etc.

Hidden Content

You must

[To see links please register here]

or

[To see links please register here]

to view this content.

Notice my use of capitalization. By no means is this a required thing, but I use it to highlight the important parts of this. I am drawing attention to the thing (that we're doing), the what (it's for), and the who (it's coming from).

We can simply repeat this for number 2, but changing the variable it is for.

Hidden Content

You must

[To see links please register here]

or

[To see links please register here]

to view this content.

3. Add up both of the numbers
This one is a simple describing the add operation:

Hidden Content

You must

[To see links please register here]

or

[To see links please register here]

to view this content.

Again here, I capitalized certain aspects to draw attention to them. (but it's not needed)

4. Tell the user the result
This is another simple one, all we need to do is describe the process

Hidden Content

You must

[To see links please register here]

or

[To see links please register here]

to view this content.

So, that should have all made pretty basic sense, our pseudo code would look like this (in full)

Hidden Content

You must

[To see links please register here]

or

[To see links please register here]

to view this content.

Building our flow chart is as simple as adding start and end blocks to that and using the symbols that correspond with the statements. Remember, flow charts are just a graphical representation of pseudo code. We still, however need to know how to read and write. Why don't we try making a high level pseudo code for that:

Read characters from user until enter is pressed
Convert each letter to decimal
Convert decimal string to number

It should be noted, that this pseudo-code has no error handling ability, right now we're going to leave that out (you guys could make psuedo-code and a flowchart for that :tongue:)
Now, you can see how we took the job of reading an integer and broke that down into tasks. We will now go to low level pseudo-code and break these tasks into steps.

1. Read characters from user until enter is pressed

Hidden Content

You must

[To see links please register here]

or

[To see links please register here]

to view this content.

Note: we can use arrays, since most people understand the concept of a list, which for all intensive purposes is an array.

2. Convert each letter to decimal

Hidden Content

You must

[To see links please register here]

or

[To see links please register here]

to view this content.

Note: since people know that A is a list, we can use a for each, because they understand both what a list is, and that A is a list. Notice the backwards format of the for-each, we do this to prevent confusion, though we don't have to.

3. Convert decimal string to number
This is where it's going to get difficult. We're going to describe an actual algorithm here. What we've created so far is a BCD (binary coded decimal) string, so we aren't talking about atoi anymore, but we're talking about bcdtoi. For your reference, I've written a simple bcdtoi function (if you need to understand the basic algorithm).

Hidden Content

You must

[To see links please register here]

or

[To see links please register here]

to view this content.

For those of you who don't want to cheat, let's make this one in 2 steps. We'll first plan out our algorithm in high level pseudo code (really more like plain English), and then go down.
If we start at the least significant end of the number, then we just have to add everything up, but we have to multiply by 10 for every position we move towards the most significant side. This is pretty easy now that we understand that.[/i][/i][/i][/i][/i][/i][/i]

Hidden Content

You must

[To see links please register here]

or

[To see links please register here]

to view this content.

That should make basic sense. We don't have any sort of loop counter because this is pseudo code, meaning we can pretend to do things like automatically know how large a list/array is.

We only need to add one more thing to our pseudo code to have it complete. Since we're not taking in any data, we don't need to START WITH anything, but we need to END WITH our result. With that added, here's the pseudo code for our "read number from user":

Hidden Content

You must

[To see links please register here]

or

[To see links please register here]

to view this content.

Note: in pseudo code, you should write self descriptive steps whenever you possibly can, but when you can't, you should distinguish your "comments" from the steps. I do this with parens.

Converting pseudo-code to flow charts
Some of you might be wondering, where are all these symbols for while and for-each for flow charting? The short answer: there aren't any, pseudo-code is still ultra high level when it comes to assembly. Your flow chart is the low level planning step, and in low level programming, while loops, for-each's, list sizing, etc don't exist. They won't exist in your flow charts either (even if there are symbols for them).
A while loop just uses an if statement to determine which block to go to next
A for loop is just a while loop, but it uses a counter
A for-each is just a for loop, where you extract the variable out of the list into a temporary one in the body of the block
List sizing doesn't exist. You will need to invent a way to know when you are at the end of the list (or how big it is).

So, let's get started with our example. At first, I'll just make a direct translation, and later on we might talk about optimization (that's the entire focus of the next part).
Starting off with a completely blank flow chart (I'll explain this later).
[Image: Cp4nCKs.png]

Now, don't forget to add your entry point (I'm not making a new screenshot just for that), and let's get into it. Our read character line is pretty easy to make, our flow chart has this capacity built in (because it's already a reduced step).
So, for the first loop, we need a counter for our list, and we need to initialize that to 0, then we need to convert our WHILE to an IF loop. This is what I came up with.
[Image: XYqfczW.png]

NOTE: when flow charting, using a decision symbol, arrows to the left is for FALSE, and to the right is for TRUE

To make things easier in the next step, I'm going to set the last element of our character array to 0. This makes it so we don't need a new variable on the next loop, and then we can pass our counter along as the size.
[Image: v5lGxXl.png]

I also re-initialized I to 0, so we can use it in our for-each loop. At this point, we've completed flowcharting the entire first task of our read-number subroutine. For the for-each, we will need to convert this into an if-loop, but we will need to pull out each element as C before we do anything on it.

Here is the beginning of my loop. Notice how I pull out the variable C and I'm working based on that? This means I don't need to know the size of the list, because the end of the list is terminated with a symbol that the user couldn't have entered (since we aren't doing error checking and using the honor system).
[Image: m8sQwRh.png]

I'm going to finish up the left side of this loop now. If you're reading along, you should try and do it yourself, it is very similar to the last one we did.

[Image: hy8vYzK.png]

Notice how I re-ordered the operations in my loop since we last saw it? This is because I realized in breaking down the task that what I had first thought of wouldn't work. This happens incredibly often in programming, but when you're working on the small scale of flow charting (specifically charting based on pseudo-code), the damage done with a single change is nonexistent in most cases. This will save your ass some day.

At this point, I want to take a quick time out and explain a neat flow charting trick with our loops. Our current arrows (while correct) look pretty messy. The trick is, as long as you only go into the side that you're allowed to branch from, you can go into one of the other vertices. Since we tapped off of BOTH the true and false sides, we can go up through the middle. I restructured our chart to match this:
[Image: ZxlNwVc.png]

That looks better. Good news! We've now completed the second macro-step (task), and I now contains the length of our list, all we need to do is the final computation. This one should be nearly identical, except that we have to initialize one thing first.
Ok, after implementing that, our flow chart is done! I moved a couple things around so it takes up less space, but it's finished.
[Image: 2SHi0xz.png]

Now, all we need to do is label it, then we can go back and update our original chart. I'm naming this READ_NUM
[Image: LaPz9lB.png]

Let's go back and edit our original chart to make use of this. We're starting off with this:
[Image: HsVeW6O.png]

Let's also go through and use our flow chart loop trick to make it look a little cleaner. I'll do that step first
[Image: DX8WGgj.png]

Cool, now we can go through and replace those IO blocks with predefined process blocks:
[Image: IWAmRvI.png]

But, we aren't done yet. READ_NUM outputs a variable called RESULT, not A or B. We need to transfer that now.
[Image: A6QFqLZ.png]

Perfect! We now have done the steps necessary to implement the majority of this application in ARM! Pat yourself on the back.
Side note: I personally like to include my subroutine flow charts on the same document, just so I have reference. You don't have to do this, but it's nice.

Hidden Content

You must

[To see links please register here]

or

[To see links please register here]

to view this content.

Next, we go through these one by one and add the necessary code to run them. We know that READ_NUM doesn't take any arguments, so we can simply call those subroutines. Remember that in ARM this is a branch (with link) or BL instruction. We also know that since READ_NUM only returns one value, it will be stored in R0 on its return. We will need to choose here how we want to store those variables. Since our program is short, I am choosing to store them in registers, and I'm using R1 and R2 to store A and B respectively. We can then add that to the 20 and 50 lines. We're left with this

Hidden Content

You must

[To see links please register here]

or

[To see links please register here]

to view this content.

Next up we need to translate the 30 and 60 lines. There are two ways to do this, the optimized way, and the x86 way. Since our next part deals with optimization, we will do this using the x86 way. Below is a spoiler containing the optimized way if you were curious.

Hidden Content

You must

[To see links please register here]

or

[To see links please register here]

to view this content.

Hidden Content

You must

[To see links please register here]

or

[To see links please register here]

to view this content.

This should look pretty familiar to you if you have ever used assembly before (most notably x86). In these lines we have to break down that conditional in our flow chart into 2 instructions. We need to test (compare with 0) the value of A and B to determine if they are <= 0, and if they are <= (LE suffix on the branch) we need to branch (jump) up and read again.
Now, we are at the line that reads 70 C <= A + B. In x86, this would be a hassle, we'd have to do something like this

Hidden Content

You must

[To see links please register here]

or

[To see links please register here]

to view this content.

and that's just messy. Here, we get to take full advantage of ARM's 3-operand setup, requiring only 1 instruction (and 1 clock cycle) to do what it took 2 in x86. We'll keep up with our standard of holding variables in registers, but for this one, we're going to look ahead to our 80 line and see that we're about to write it out. This means that we'll have to pass it in R0, so why store this in R3 when we can store it in R0 and save a step? Good idea, let's put it in R0.

Hidden Content

You must

[To see links please register here]

or

[To see links please register here]

to view this content.

See how we did that? we saved 1 clock cycle (and a whole line) with the 3-operand method, and we saved a whole step (another 1 clock cycle, and 4 bytes of valuable register space) by just putting it in R0.

Now, all we have left to do in our program is the ending (and the beginning). We need to do the following things:

Save R1 and R2
Restore R1 and R2
Return to the OS (with syscall #1)

Remember your push and pop multiple! Your finished start subroutine should look like this:

Hidden Content

You must

[To see links please register here]

or

[To see links please register here]

to view this content.

Congrats! You've translated the program's main subroutine. Time to do the same thing with the READ_NUM. This one will be a tad more complex, so I'll walk you through how it goes. Start with your flow chart
[Image: vWfHlWA.png]

Now, for this one, we want to separate it into our 3 distinct segments again. We won't name these with anything that sentimental, we'll call it stages 1, 2, and 3. We'll define these by grabbing the logical groups of items, that is the blocks that pertain to certain loops. We want to precede each stage by the initialization that comes along with it though. For stage 1, the init is these two blocks:
[Image: pIGbkKP.png]

So, our READ_NUM sub should start off as follows

Hidden Content

You must

[To see links please register here]

or

[To see links please register here]

to view this content.

Here I chose R8 for I, and I will likewise choose R9 for J. However, as we go down, we will need space to hold 2 arrays. We won't be able to hold these in registers, so we're going to need to have space elsewhere for them. Since our program is very simple, and we know it will not have any threads or recursion, we can store these on a stack. Why don't we set that up now.
It would be terribly complicated to use the system stack, so we're going to create 2 new ones. ARM has a very useful instruction that allows us to treat any address as the stack, just for that instruction. The instructions are:

Hidden Content

You must

[To see links please register here]

or

[To see links please register here]

to view this content.

We'll take full advantage of this trick, allowing us to use a stack and not have to reverse it between loops.
Our stack pointers will be held in R10 and R11 to avoid conflicts, and we'll just define them by using SPACE directives (zero-fill n bytes) in the .data section. After setting all that up, our code looks like this

Hidden Content

You must

[To see links please register here]

or

[To see links please register here]

to view this content.

We reserve 48 bytes for each for 2 reasons:
1. we want to be able to hold up to 12 decimal numbers
2. stacks for ARM MUST be aligned to a 4-byte boundary (we can't just push a byte)

Now, let's go ahead and do the reading for variable C. We'll need the sys_read (#3) syscall, and we're reading from fd 0 (stdin)

Hidden Content

You must

[To see links please register here]

or

[To see links please register here]

to view this content.

So, at this point, we consult our flow chart, we see that we need to see if C is the enter key (ASCII code 0xA). This means, that we need to compare C (R0) with #0xA, and if it is equal to that, jump out of our loop (to stage 2's init). It comes out looking like this

Hidden Content

You must

[To see links please register here]

or

[To see links please register here]

to view this content.

As you can see, this is pretty much the x86 way of doing things, as I've stated, in the next part I'll show you how to make this not look and run like shit. But for now, we move on to the next part, storing our C onto our "stack". Now, the x86 way of doing this would be to save ESP into another register, replace it with our new stack, then use push, but that's horse shit. We're going to do it the ARM way, by using r10 as our stack pointer, and a stack that grows upwards. This will be the STMIA instruction (store multiple, increment after. Here's what it looks like

Hidden Content

You must

[To see links please register here]

or

[To see links please register here]

to view this content.

Whoa, that looks weird, doesn't it? Let me explain what all of that means, namely the exclamation point and the curly braces.
We'll start with the curly braces. I explained at a different point that the PUSH and POP instructions were just aliases to STM and LDM instructions, and you've seen me push and pop multiple registers, those having braces on them, this is for the exact same reason. This instruction takes a list of registers, and the list is encased in curly braces. Even though we're only pushing one register, we still put braces around it (I'm not sure if we have to, ARM's documentation doesn't give a one-register example, so we'll keep them just in case).
As for the exclamation point, this is what makes it act like a stack. The mark tells the CPU that after it writes the data (and does increment or decrement), it should write that value back into R10, so we don't need to keep a counter of how many items we've pushed.

Hopefully you made sense of that, I have a tendency to under-explain complicated things and over-explain trivial ones, so I tried to make it a medium level explanation in that case. Ok, we're off to the next step, reading in a new value of C. For this, we just copy the code from above and paste it inside stage 1. After that, we have the trivial increment of I (R8). Now remember in that, ARM doesn't have an INC instruction, so we just need a simple ADD instruction against R8 twice and #1. Lastly, we need to branch back up to stage 1 to continue our loop. Your completed stage 1 should be as follows

Hidden Content

You must

[To see links please register here]

or

[To see links please register here]

to view this content.

Perfect, but it's important to note here that we don't actually need that line to move into R7, since R7 will have been unmodified at this point, it will still contain our syscall number, so we don't need to waste the cpu time setting it again. Go ahead and delete that one.

Now, we can move on to stage 2's init procedure. According to our flow chart it looks like this:
[Image: xsznCFS.png]

The very first thing we need to do is set the end of our list to 0. Well lucky us, that was done for us (because we used the SPACE directive), so anything we didn't explicitly set is already 0. Sweet, we can skip that entirely (saved CPU time). Next up, we need to set I back to 0, that's pretty easy.

Hidden Content

You must

[To see links please register here]

or

[To see links please register here]

to view this content.

Next, we need to load in the first number the user entered, since A[I] is the start of that stack, we'll want to go ahead and set R10 back to =listA, then we'll want to pop it off the stack, incrementing afterwards. This means using a LDMIA instruction. We need to define a variable to hold C in again, and I don't see any reason why we can't just use R0 for it.

Hidden Content

You must

[To see links please register here]

or

[To see links please register here]

to view this content.

Now, we're into stage 2. I'm not going to walk through this one (in an effort to make this a shorter thread), it's basically the same as stage 1. Go ahead and try to do it on your own. My code is below:

Hidden Content

You must

[To see links please register here]

or

[To see links please register here]

to view this content.

Makes sense. Stage 3 init we need to define (and initialize to 0) 2 variables: RESULT and J. For these I'm going to use R0 and R7 respectively. I want to keep RESULT in R0 so we have one less step to do when we're finished, I'll be able to use R1 for T in this case. Oh, and I just noticed, I defined R11 as the "B" stack, when it's really the N stack. Just keep that in mind. I don't want to walk you through this one either, but I will comment it nicely. Below is my code. PLEASE try your hand at this, especially the multiply part!

Hidden Content

You must

[To see links please register here]

or

[To see links please register here]

to view this content.

I bet we came up with different ways of doing that multiply step. See, ARM has the ability to multiply and add in the same step, which is incredibly handy in this case. The drawback is it won't operate on constants, so we need to load a register with #10 to do this. However, we still get benefit from this, because our multiply step is in a loop and that #10 is a constant. All we have left now is to do clean up, which also means we need to do set up. In this subroutine we used:
R0, R1, R6, R7, R10, R11. We'll need to push those and pop them out at the end, then its as simple as returning (BX LR) and we're all done. Our finished subroutine looks like this (note, I put this in a separate file, so I made it global and the data segment is in here with it):

Hidden Content

You must

[To see links please register here]

or

[To see links please register here]

to view this content.

And there you have it, our READ_NUM sub in 42 lines of ARM code, and it's very far from optimized. Our entire program (still missing WRITE_NUM) is 58 lines of assembly, which is amazingly small, I'd be up over 120 lines if this were x86, and easily double that in clock cycles. Looks like even if you write code like an x86 programmer, ARM is still the way to go.

I hope you all enjoyed reading this extremely long installation of the comprehensive ARM assembly tutorial series, and more than that, I hope you learned a lot about the wonderful world of ARM and RISC. In the next and final installment, we'll finish up this program and learn how to optimize it so that it runs at the super speed ARM is capable of.

As a conversation starter, why don't YOU (the reader) make up pseudo-code (high and low level) for the WRITE_NUM subroutine, and give us a flow chart for it! Bonus points (literal NSP or rep) if you write the code for it also, and extra bonuses if your code is optimized! (hint: condition and status flags). See you guys next time.[/hide]
[/hide]
[/hide]

besin858891