3Nigma's utility site
Home · Articles · News CategoriesSeptember 07 2010 00:41:57
Navigation
Home
Articles
Forum
myBlog[Romanian]
FAQ
News Categories
Contact Me
Links
Search
eniMath - Challlenges
eni - SkyCams
eni - Stamps
Users Online
Guests Online: 1
No Members Online

Registered Members: 27
Newest Member: selba
Friendly links
SkullBox - IT Knowledge Realm
eniAsm,the layout
eniAsm

As I said in the first post, I am going to build myself a assembler, and I’m not lying.

I also stated that I will make my progress public and for that matter, I am not lying eider. ;)

Ok,several days have passed and what is the current state of the project you might ask?

Good question, but before we’ll get to that, let me show you a few language specifications that I thought of implementing .Normally asm instruction would be like "mov ax,3" which means, move 3 to ax,well,I’m gonna change the standard way of doing this kind of instructions by being more specific.So,instead of normal syntax, I’ll go for the explicit type : “move 3 to ax”. Why so? Well…it’s because, as I said in the beginning, I intend to lure people to this low-level way. And how to do that better if not by simplifying the logic as much as possible. More comprehensive -> [not necessarily but usually] means more attractive.

Furthermore…each instruction will end with a comma [it’s my way of separating things]. Now, I know that this is a bit confusing at the beginning. You [the one who has a little experience in asm],but my decision of this kind of separation was taken in regard of the better logic and for better salability of the new-beginners entry into asm. Anyway, I don’t think that this is such a problem because the high ended assembler who will come in contact with this assembler will only do so because of their desire to dissect a assembler and not to practice good code on it. Good code assembly is worth practicing on high end assemblers made by professionals and not by high-school students who have nothing else to do in their spare time but mingle with some bits and protocols. This is my opinion at least. If I am wrong…I am glad J.

So, each statement will be ended in a comma…this will practically tell the compiler that the instruction ends and a new instruction will be expected on the new line [in later version ,I’ll reduce the compiler expectation regardless of new lines between the commands or not].

Another important aspect is the assembler’s base reference.

I decided to make this part in pure decimal base system reference. The reason why I choose this is farely simple : anyone [or most of us J] are familiar to the base10 numbers. I don’t know if my decision will come to hunt me later in the development, but I’m a man of pure reason and chance…living in the present and if the reasons are write, taking the chance. So I’ll also take this one.

Why should this be a problem? Well…if you have some kind of assembler knowledge, you’ll know that base16 [hex-decimal] are implemented as a standard. And not the well known base10[decimal] .

There will also be some syntax differences like you’ve seen above “mov -> move” but this will be made public through article updates. If you have any names that you think are appropriate for different instructions in assembly, you are free to tell me and maybe we’ll work something out.

So, calling interrupt 10h normally looks like this :

Int 10h

Well, in my assembler syntax ,it will look like this

Inter 16, *notice that I’ve also added the ending comma as specified above

Watching the above instruction…you’ll see on the write an apteryx (*). This kind of special characters will let the assembler know that anything that will come after it, is not meant for assembling.

So the above syntax will be assembled correctly by eniAsm.Multiline comments will be implemented in later versions.

articles: 118.gif


Now, let’s take a small example and see how eniAsm’s syntax differs :
Normal assembler :
eniAsm version of same program:

mov ah,00h
mov al,10h
int 10h ;BIOS set video mode -> graphics 4/16colors

mov ah,0
int 16h ;BIOS wait for key press
iret ;return to operating syste

move 0 to ah,
move 16 to al,
inter 16, * BIOS set video mode -> graphics 4/16colors

move 0 to ah,
inter 22, * BIOS wait for key press
rets, * return to operating system

Isn’t this more readable? Confusing for the advanced, appealing for the newbie in my opinion J.

You may wonder what actual coding have I done till now? Well I have been experimenting quite a bit and let me tell you one thing, although I mentioned that I will use Bc++ 3.1 as the compiler for my project, due to code maintenance and the lack of project support from Borland, my code will slowly migrate to Bloodshed dev c++ standards (C99) but not yet.

Ok,fine and dandy,but have I programmed anything?

Well,yes… for my first assembler post,although only this one, The source is compiled in bc 3.1 .

So,here it is.

How many instructions does my early assembler test knows?

Only 3…

move <value> to <register>,
inter <interrupt number>,
rets,

you’ll have to compile it with Borland c++ 3.1 and the way you use the assembler is:

<assembler.executable> <easm_source_file> <easm_output_file>


If you keep the original source[as downloaded] and compile that ... and take the source file from here . Then,your console syntax will look like this:
ENIASM_v0.2 ENIEX_V1.EAS eniex_v1.com
Upon executing the above instruction you will get a screen with the following data:
<***>Building source code ENIEX_V1.EAS !
eAsm code assembly report:

move 0 to ah : 180 0
move 16 to al : 176 16
inter 16 : 205 16
move 0 to ah : 180 0
inter 22 : 205 22
rets :195

<***>Program built : eniex_v1.com !
EniAsm v0.2 by 3Nigma[eni4ever.com]

What does this screen mean?
some lines are self-explanatory while others such as "move 0 to ah : 180 0" are not so much. The first part[move 0 to ah] I guess that you allready know that it is the instruction currently processed,but what does "180 0" mean?
Well,the numbers that follow the instructions each listed on it's line are the opcodes [translations into pure binary] of the instruction it represents. Opcodes[raw bits] are the raw communication protocol of every computer hardware.
This is a future build by me in the current state of pain-taking debugging phase. It basically helps me to follow the instructions as they are translated intro pure hardware language and let's me tweak it where necessary.

If you do get the same screen,that means that the assembler managed to assemble your code and your program is ready to go[which will be eniex_v1.com 10Bytes in length]

The only output file capable of running your program from windows mode,is the old .com.[raw binary executable file] file type.

In the future I will,normally, implement other file extensions.

So what does it do? Well...upon running it,2 things will happen:
1. It will switch to another resolution rate
2. It will stay in that resolution rate until a button is pressed
Well...you can't build games with this assembler but you have to admit it,it's as fun start.

articles: 045.gif
Dissecting the assembler

Ok,we have reached an interesting part in this article:the part where I start to explain the current assembler layout design.
For this to happen I'll start out with a nice diagram to show the relation of the 3 major blocks of code in the assembler's structure:

As you can see there are the 3 sections in the current assembler layout:

  1. Data structures
  2. x86Instruction Class
  3. Main application



Let's dissect them one by one. The first one, Data structures consists of all the data structures needed by the x86Instruction class[it's relationship is shown in the above diagram]

These data-types are:
  • enum sub_instr_type
    • enumerations of the subtype of operation ex: "immediate_to_data"
  • enum instr_type
    • enumerations of the instruction types known by the assembler
  • enum instr_operands
    • enumerates the types of operands that are involved in the assembler command
  • struct instr_layout
    • probably the most important data type since it contains intel's opcode command specifications.It is used in the final delivery of the opcode.



The next section is the most important one in the assembler [no wonder it ocupies more then 75% of the current assemble source file] , the x86Instruction class.
Let's have a look at it,shall we?
class x86Instruction{
private:
instr_layout RInstr;
instr_type RInstrType;
sub_instr_type RSubInstrType;
int currComLine;
char *formatedCommand;

public:
char commandLength;
void procRawCom(char *com);
void chargeOpcodes();
instr_type getInstrType(char *com);
instr_operands getRegister(char *reg);
void getOperands(char *com);
char* encodeInstr();
};
The data types contained in it are:
  • instr_layout RInstr;
    • data type explained above
  • instr_type RInstrType;
    • data type explained above
  • sub_instr_type RSubInstrType;
    • data type explained above
  • int currComLine;
    • current line processed [needed for later debug implementations]
  • char *formatedCommand;
    • the finally build command after the instruction is processed.In this state,the instruction is converted into it's equivalent opcode.
  • char commandLength;
    • the length of the formatedCommand,needed for full formatedCommand reading (strlen will not suffice since it ends at the first \0 in the string,and formatedComand has in many cases \0 needed).
The functions contained in it are:
  • void procRawCom(char *com);
    • routine that does all the work on the current *com [assembly command].The works include:
      • charging RInstrType //the command type
      • charging RSubInstrType //the subcommand type
      • charging operands specified in the instruction
      • charging and processing the opcodes
  • void chargeOpcodes();
    • charges the opcodes and filling the formated Command making it ready for external use
  • instr_type getInstrType(char *com);
    • aquires the instruction type of the com . ex "move"
  • instr_operands getRegister(char *reg);
    • returns the register involved in the current command or oher dataype
  • void getOperands(char *com);
    • aquires all the operands[calls getRegiser] in the com command and also aquires the RSubInstrType
  • char* encodeInstr();
    • returns formatedCommand



The last section is the Main application.Basically,the main application puts it all together along with the "aquireRawCommand" .Which is an external function command intended to strip away the current line from the source of the easm file and returns manageable code.
ex "rets, * return to operating system" -> "rets"
So the main function puts it all together,processing each command line in the argv[1] file through the "procRawCom" command and writes the "encodeInstr()" in the argv[2] file.
The rest of the code in the main() function is intended for debug purposes only and can be easely stripped off.

Note:

The 2 arguments passed to the assembler [first = easm source,second = output binary command file] are mandatory since they are the only ones operated upon.

articles: 115.gif
Well,that's about it for the current version of eniAsm v0.2 . I suggest you look upon the code,I know it is a bit messy but I'm a learner,I'll do my best in the future.

As I said: "It is not much[3 instructions and all],but it's a start!".Please keep that in mind when you criticize me. ;)

I also want to restate that the future release of eniAsm [v0.5 hopefully] although it will have more instructions and also a small "Hello world!" example , it will be better structured and also it wil be compiled in Dev-C++.
I have chosen this language because I have reached a point where a better IDE project management is needed.

Till next time,take care...and stay tunned ;)
Any questions/suggestions/whatever at all…post them here.
(c)3Nigma 23March 2008

3Nigma on March 21 2008 22:45:43 0 Comments · 988 Reads · Print
Comments
No Comments have been Posted.
Post Comment
Please Login to Post a Comment.
Ratings
Rating is available to Members only.

Please login or register to vote.

No Ratings have been Posted.
Login
Username

Password



Not a member yet?
Click here to register.

Forgotten your password?
Request a new one here.
Shoutbox
You must login to post a message.

No messages have been posted.
Copyright "3Nigma's brain"© 2008