Bharat Banate's Work Profile

View Bharat Banate's profile on LinkedIn
Showing posts with label Unix. Show all posts
Showing posts with label Unix. Show all posts

Monday, December 3, 2007

The Year 2038 Bug

It's barely 8 years since we had the millenium bug so don't say you didn't get enough warning! A lot of systems in the world may have date rollover troubles in a fraction over 30 years time. The millenium bug (more accurately known as the Two Digit Century Rollover Bug) was caused by using 2 digits instead of 4 for the year. So Christmas 2007 falls on 12/25/07. Of course when 1999 rolled over to 2000 then the first day of the new century became 01/01/00 and this could have had serious consequences had all the old systems not been sorted out in advance. This problem will also happen again in 2099, 2199 etc if anyone is silly enough to keep using two digit year dates.

But the Unix bug will occur in 2038. That's because the date system started in 1970 and uses a time_t (signed int) to hold the number of seconds. The highest value is 2147483648-1 which is 24855.13 days. Add that to Jan 1 1970 and you get Jan 19 2038! So sometime early on that morning of that date, any software using a signed int for a date will rollover to Jan 1 1970! So how you are going to cope up with this problem dudes....!!!

Friday, September 21, 2007

Unix: The Unix Philosophy

Essentially, UNIX is made up of files. In fact, every aspect of UNIX is looked at as a file. When we write some data to be displayed on screen for example, the data is actually written to a screen file and then a certain device driver in the kernel is activated. This controls a particular device, in our case the screen. And the contents of the screen file are displayed on the screen. Files that relate to hardware are known as "special files".

We have one universal file - unix itself. But this file is broken up into many other smaller file systems. By default, i.e. when we install UNIX, there is one root and two user file systems created. Normally file systems correspond to physical sections of the disk, basically the root file system and many user file systems.

These file systems are again broken up into directories (which are again viewed as files) and files. These directories can further have sub-directories and files giving rise to a hierarchical tree-like structure.

In DOS, we sometimes divide the disk into logical sections like C and D. Each of these logical drives has its own set of directories and files. To move from one drive to another we just need to specify the drive as the DOS prompt and hit enter.

But while we are at one drive we can access a file from another drive. Now both these drives are always available by default. In UNIX there is a slight difference. While the root file system and the two user file systems that are created by default are loaded, access to any other file system is only possible if they are explicitely mounted. Mounting means nothing but loading them into memory. And considering that file systems are viewed by UNIX as files, if a time comes for them to be accessed, they have to be in memory (as like any other file).

For example, the floppy drive. This too is considered by UNIX as a file. And read or write to a floppy drive is first done in a "special file", from which then the contents are transferred to actual floppy. But to be able to access the floppy drive through the file connected to it, the file has to be mounted i.e. in memory.

Saturday, September 8, 2007

C: Is C under DOS the same as C under UNIX ?

Well, the topic seems to be weird... people can spend hours tyring to decide on this. There will be many people on either side, and none willing to budge.However, most of them will agree that in the end C remains a single entity no matter what the OS under which it is run.Why?The basic structure of C, its power, its data types, its control structures, its syntax all remain the same no matter which OS it runs under.So, where is the difference?
To prove a point to the non-believers lets write a program and run it under DOS and UNIX.


main (int argc, char* argv)
{
_int i;
_for(i=0; i < argc; i++)
_{
__printf("Argument is %s\n", argv[i]);
_}
}

Lets run this program with the parameter 'Hello World'.Under both DOS and UNIX two statements each will be displayed: the program name and 'Hello World'.
Now doesn't this prove that C under DOS and UNIX is the same?

No !!!
Lets run the program again, but with different parameter. Instead of 'Hello World', pass a * .
And what we get then?
Under DOS we get the program name and the star ' * '.
But under UNIX there is a difference. Instead of the star we get a directory listing.

There is the proof that C under these two OS's is different? Not so fast.
Lets first see what happened.Under DOS, C took the star as an argument and printed it. That's because the DOS shell, COMMAND.COM, took it for what it was: a star. The UNIX shell, however, is intelligent. It sometimes proceeds to deduce. In this case it interpreted the star to mean a directory listing. This directory listing was then passed to the program as an argument which proceeded to display the files on screen.

On the face of it, yes, the program worked differently and therefore we can say C is different under DOS and UNIX. But is it really the program that give us this difference in output? Not at all. It was the shell. If the shell had not been built to interpret the star, the directory listing would never have appeared. And that is hardly the fault of the program.

Thursday, September 6, 2007

News: The World Will End on January 19, 2038

On January 19, 2038, that is precisely what's going to happen.

For the uninitiated, time_t is a data type used by C and C++ programs to represent dates and times internally. (You Windows programmers out there might also recognize it as the basis for the CTime and CTimeSpan classes in MFC.) time_t is actually just an integer, a whole number, that counts the number of seconds since January 1, 1970 at 12:00 AM Greenwich Mean Time. A time_t value of 0 would be 12:00:00 AM (exactly midnight) 1-Jan-1970, a time_t value of 1 would be 12:00:01 AM (one second after midnight) 1-Jan-1970, etc.. Since one year lasts for a little over 31 000 000 seconds, the time_t representation of January 1, 1971 is about 31 000 000, the time_t representation for January 1, 1972 is about 62 000 000, etc.


If you're confused, here are some example times and their exact time_t representations:

_______Date & time _____________time_t representation
01-Jan-1970, 12:00:00 AM GMT ____________________0
01-Jan-1970, 12:00:01 AM GMT ____________________1
01-Jan-1970, 12:01:00 AM GMT ___________________60
01-Jan-1970, 01:00:00 AM GMT _________________3600
02-Jan-1970, 12:00:00 AM GMT ________________86400
03-Jan-1970, 12:00:00 AM GMT _______________172800
01-Feb-1970, 12:00:00 AM GMT ______________2678400
01-Mar-1970, 12:00:00 AM GMT ______________5097600
01-Jan-1971, 12:00:00 AM GMT _____________31536000
01-Jan-1972, 12:00:00 AM GMT _____________63072000
01-Jan-2003, 12:00:00 AM GMT ___________1041379200
01-Jan-2038, 12:00:00 AM GMT ___________2145916800
19-Jan-2038, 03:14:07 AM GMT ___________2147483647

By the year 2038, the time_t representation for the current time will be over 2 140 000 000. And that's the problem. A modern 32-bit computer stores a "signed integer" data type, such as time_t, in 32 bits. The first of these bits is used for the positive/negative sign of the integer, while the remaining 31 bits are used to store the number itself. The highest number these 31 data bits can store works out to exactly 2 147 483 647. A time_t value of this exact number, 2 147 483 647, represents January 19, 2038, at 7 seconds past 3:14 AM Greenwich Mean Time. So, at 3:14:07 AM GMT on that fateful day, every time_t used in a 32-bit C or C++ program will reach its upper limit.

One second later, on 19-January-2038 at 3:14:08 AM GMT, disaster strikes.

What will the time_t's do when this happens?
when a signed integer reaches its maximum value and then gets incremented, it wraps around to its lowest possible negative value. (The reasons for this have to do with a binary notation called "two's complement"; I won't bore you with the details here.) This means a 32-bit signed integer, such as a time_t, set to its maximum value of 2 147 483 647 and then incremented by 1, will become -2 147 483 648. Note that "-" sign at the beginning of this large number. A time_t value of -2 147 483 648 would represent December 13, 1901 at 8:45:52 PM GMT.

So, if all goes normally, 19-January-2038 will suddenly become 13-December-1901 in every time_t across the globe, and every date calculation based on this figure will go haywire. And it gets worse. Most of the support functions that use the time_t data type cannot handle negative time_t values at all. They simply fail and return an error code. Now, most "good" C and C++ programmers know that they are supposed to write their programs in such a way that each function call is checked for an error return, so that the program will still behave nicely even when things don't go as planned. But all too often, the simple, basic, everyday functions they call will "almost never" return an error code, so an error condition simply isn't checked for. It would be too tedious to check everywhere; and besides, the extremely rare conditions that result in the function's failure would "hardly ever" happen in the real world. (Programmers: when was the last time you checked the return value from printf() or malloc()?) When one of the time_t support functions fails, the failure might not even be detected by the program calling it, and more often than not this means the calling program will crash. Spectacularly.

What about making time_t unsigned in 32-bit software?
One of the quick-fixes that has been suggested for existing 32-bit software is to re-define time_t as an unsigned integer instead of a signed integer. An unsigned integer doesn't have to waste one of its bits to store the plus/minus sign for the number it represents. This doubles the range of numbers it can store. Whereas a signed 32-bit integer can only go up to 2 147 483 647, an unsigned 32-bit integer can go all the way up to 4 294 967 295. A time_t of this magnitude could represent any date and time from 12:00:00 AM 1-Jan-1970 all the way out to 6:28:15 AM 7-Feb-2106, surely giving us more than enough years for 64-bit software to dominate the planet. It sounds like a good idea at first. We already know that most of the standard time_t handling functions don't accept negative time_t values anyway, so why not just make time_t into a data type that only represents positive numbers?

Well, there's a problem. time_t isn't just used to store absolute dates and times. It's also used, in many applications, to store differences between two date/time values, i.e. to answer the question of "how much time is there between date A and date B?". (MFC's CTimeSpan class is one notorious example.) In these cases, we do need time_t to allow negative values. It is entirely possible that date B comes before date A. Blindly changing time_t to an unsigned integer will, in these parts of a program, make the code unusable.
You'd fix one set of bugs (the Year 2038 Problem) only to introduce a whole new set (time differences not being computed properly).

Not very obvious, is it?
The greatest danger with the Year 2038 Problem is its invisibility. The more-famous Year 2000 is a big, round number; it only takes a few seconds of thought, even for a computer-illiterate person, to imagine what might happen when 1999 turns into 2000. But January 19, 2038 is not nearly as obvious. Software companies will probably not think of trying out a Year 2038 scenario before doomsday strikes. Of course, there will be some warning ahead of time. Scheduling software, billing programs, personal reminder calendars, and other such pieces of code that set dates in the near future will fail as soon as one of their target dates exceeds 19-Jan-2038, assuming a time_t is used to store them. But the healthy paranoia that surrounded the search for Year 2000 bugs will be absent. Most software development departments are managed by people with little or no programming experience. It's the managers and their V.P.s that have to think up long-term plans and worst-case scenarios, and insist that their products be tested for them. Testing for dates beyond January 19, 2038 simply might not occur to them. And, perhaps worse, the parts of their software they had to fix for Year 2000 Compliance will be completely different from the parts of their programs that will fail on 19-Jan-2038, so fixing one problem will not fix the other.

Wednesday, September 5, 2007

Unix/Linux: What is the main advantage of creating links to a file instead of copies of the file?

The main advantage is not really that it saves disk space (though it does that too) but, rather, that a change of permissions on the file is applied to all the link access points. The link will show permissions of lrwxrwxrwx but that is for the link itself and not the access to the file to which the link points. Thus, if you want to change the permissions for a command, such as 'su', you only have to do it on the original. With copies you have to find all of the copies and change permission on each of the copies.