source: rtems/cpukit/libfs/src/nfsclient/README @ 0ec9bbc

5
Last change on this file since 0ec9bbc was 58d38a0, checked in by Joel Sherrill <joel.sherrill@…>, on 02/26/08 at 19:23:53

2008-02-26 Joel Sherrill <joel.sherrill@…>

  • configure.ac, libfs/Makefile.am: Add nfsclient to cpukit. Although the use of RPC/XDR could be an issue, the code does build multilib across all targets. There are a few remaining warnings to deal with.
  • libfs/src/nfsclient/.cvsignore, libfs/src/nfsclient/ChangeLog.slac, libfs/src/nfsclient/LICENSE, libfs/src/nfsclient/Makefile.am, libfs/src/nfsclient/README, libfs/src/nfsclient/preinstall.am, libfs/src/nfsclient/rfc1094.txt, libfs/src/nfsclient/proto/mount_prot.h, libfs/src/nfsclient/proto/mount_prot.x, libfs/src/nfsclient/proto/mount_prot_xdr.c, libfs/src/nfsclient/proto/nfs_prot.h, libfs/src/nfsclient/proto/nfs_prot.x, libfs/src/nfsclient/proto/nfs_prot_xdr.c, libfs/src/nfsclient/src/cexphelp.c, libfs/src/nfsclient/src/dirutils.c, libfs/src/nfsclient/src/librtemsNfs.h, libfs/src/nfsclient/src/nfs.c, libfs/src/nfsclient/src/nfs.modini.c, libfs/src/nfsclient/src/nfsTest.c, libfs/src/nfsclient/src/rpcio.c, libfs/src/nfsclient/src/rpcio.h, libfs/src/nfsclient/src/rpcio.modini.c, libfs/src/nfsclient/src/sock_mbuf.c, libfs/src/nfsclient/src/xdr_mbuf.c: New files.
  • Property mode set to 100644
File size: 18.5 KB
Line 
1RTEMS-NFS
2=========
3
4A NFS-V2 client implementation for the RTEMS real-time
5executive.
6
7Author: Till Straumann <strauman@slac.stanford.edu>, 2002
8
9Copyright 2002, Stanford University and
10                Till Straumann <strauman@slac.stanford.edu>
11
12Stanford Notice
13***************
14
15Acknowledgement of sponsorship
16* * * * * * * * * * * * * * * *
17This software was produced by the Stanford Linear Accelerator Center,
18Stanford University, under Contract DE-AC03-76SFO0515 with the Department
19of Energy.
20
21
22Contents
23--------
24I   Overview
25  1) Performance
26  2) Reference Platform / Test Environment
27
28II  Usage
29  1) Initialization
30  2) Mounting Remote Server Filesystems
31  3) Unmounting
32  4) Unloading
33  5) Dumping Information / Statistics
34
35III Implementation Details
36  1) RPCIOD
37  2) NFS
38  3) RTEMS Resources Used By NFS/RPCIOD
39  4) Caveats & Bugs
40
41IV  Licensing & Disclaimers
42
43I  Overview
44-----------
45
46This package implements a simple non-caching NFS
47client for RTEMS. Most of the system calls are
48supported with the exception of 'mount', i.e. it
49is not possible to mount another FS on top of NFS
50(mostly because of the difficulty that arises when
51mount points are deleted on the server). It
52shouldn't be hard to do, though.
53
54Note: this client supports NFS vers. 2 / MOUNT vers. 1;
55      NFS Version 3 or higher are NOT supported.
56
57The package consists of two modules: RPCIOD and NFS
58itself.
59
60 - RPCIOD is a UDP/RPC multiplexor daemon. It takes
61   RPC requests from multiple local client threads,
62   funnels them through a single socket to multiple
63   servers and dispatches the replies back to the
64   (blocked) requestor threads.
65   RPCIOD does packet retransmission and handles
66   timeouts etc.
67   Note however, that it does NOT do any XDR
68   marshalling - it is up to the requestor threads
69   to do the XDR encoding/decoding. RPCIOD _is_ RPC
70   specific, though, because its message dispatching
71   is based on the RPC transaction ID.
72
73 - The NFS package maps RTEMS filesystem calls
74   to proper RPCs, it does the XDR work and
75   hands marshalled RPC requests to RPCIOD.
76   All of the calls are synchronous, i.e. they
77   block until they get a reply.
78
791) Performance
80- - - - - - - -
81Performance sucks (due to the lack of
82readahead/delayed write and caching). On a fast
83(100Mb/s) ethernet, it takes about 20s to copy a
8410MB file from NFS to NFS.  I found, however, that
85vxWorks' NFS client doesn't seem to be any
86faster...
87
88Since there is no buffer cache with read-ahead
89implemented, all NFS reads are synchronous RPC
90calls. Every read operation involves sending a
91request and waiting for the reply. As long as the
92overhead (sending request + processing it on the
93server) is significant compared to the time it
94takes to transferring the actual data, increasing
95the amount of data per request results in better
96throughput. The UDP packet size limit imposes a
97limit of 8k per RPC call, hence reading from NFS
98in chunks of 8k is better than chunks of 1k [but
99chunks >8k are not possible, i.e., simply not
100honoured: read(a_nfs_fd, buf, 20000) returns
1018192]. This is similar to the old linux days
102(mount with rsize=8k).  You can let stdio take
103care of the buffering or use 8k buffers with
104explicit read(2) operations. Note that stdio
105honours the file-system's st_blksize field
106if newlib is compiled with HAVE_BLKSIZE defined.
107In this case, stdio uses 8k buffers for files
108on NFS transparently. The blocksize NFS
109reports can be tuned with a global variable
110setting (see nfs.c for details).
111
112Further increase of throughput can be achieved
113with read-ahead (issuing RPC calls in parallel
114[send out request for block n+1 while you are
115waiting for data of block n to arrive]).  Since
116this is not handled by the file system itself, you
117would have to code this yourself e.g., using
118parallel threads to read from a single file from
119interleaved offsets.
120
121Another obvious improvement can be achieved if
122processing the data takes a significant amount of
123time. Then, having a pipeline of threads for
124reading data and processing them makes sense
125[thread b processes chunk n while thread a blocks
126in read(chunk n+1)].
127
128Some performance figures:
129Software: src/nfsTest.c:nfsReadTest() [data not
130          processed in any way].
131Hardware: MVME6100
132Network:  100baseT-FD
133Server:   Linux-2.6/RHEL4-smp [dell precision 420]
134File:     10MB
135
136Results:
137Single threaded ('normal') NFS read, 1k buffers: 3.46s (2.89MB/s)
138Single threaded ('normal') NFS read, 8k buffers: 1.31s (7.63MB/s)
139Multi  threaded; 2 readers, 8k buffers/xfers:    1.12s (8.9 MB/s) 
140Multi  threaded; 3 readers, 8k buffers/xfers:    1.04s (9.6 MB/s)
141
1422) Reference Platform
143- - - - - - - - - - -
144RTEMS-NFS was developed and tested on
145
146 o RTEMS-ss20020301 (local patches applied)
147 o PowerPC G3, G4 on Synergy SVGM series board
148   (custom 'SVGM' BSP, to be released soon)
149 o PowerPC 604 on MVME23xx
150   (powerpc/shared/motorola-powerpc BSP)
151 o Test Environment:
152    - RTEMS executable running CEXP
153    - rpciod/nfs dynamically loaded from TFTPfs
154    - EPICS application dynamically loaded from NFS;
155      the executing IOC accesses all of its files
156      on NFS.
157
158II Usage
159---------
160
161After linking into the system and proper initialization
162(rtems-NFS supports 'magic' module initialization when
163loaded into a running system with the CEXP loader),
164you are ready for mounting NFSes from a server
165(I avoid the term NFS filesystem because NFS already
166stands for 'Network File System').
167
168You should also read the
169
170  - "RTEMS Resources Used By NFS/RPCIOD"
171  - "CAVEATS & BUGS"
172
173below.
174
1751) Initialization
176- - - - - - - - -
177NFS consists of two modules who must be initialized:
178
179 a) the RPCIO daemon package; by calling
180
181      rpcUdpInit();
182
183    note that this step must be performed prior to
184    initializing NFS:
185
186 b) NFS is initialized by calling
187
188      nfsInit( smallPoolDepth, bigPoolDepth );
189
190    if you supply 0 (zero) values for the pool
191    depths, the compile-time default configuration
192    is used which should work fine.
193
194NOTE: when using CEXP to load these modules into a
195running system, initialization will be performed
196automagically.
197
1982) Mounting Remote Server Filesystems
199- - - - - - - - - - - - - - - - - - -
200
201There are two interfaces for mounting an NFS:
202
203 - The (non-POSIX) RTEMS 'mount()' call:
204
205     mount( &mount_table_entry_pointer,
206            &filesystem_operations_table_pointer,
207            options,
208            device,
209            mount_point )
210
211    Note that you must specify a 'mount_table_entry_pointer'
212    (use a dummy) - RTEMS' mount() doesn't grok a NULL for
213    the first argument.
214
215     o for the 'filesystem_operations_table_pointer', supply
216
217         &nfs_fs_ops
218   
219     o options are constants (see RTEMS headers) for specifying
220       read-only / read-write mounts.
221
222     o the 'device' string specifies the remote filesystem
223       who is to be mounted. NFS expects a string conforming
224       to the following format (EBNF syntax):
225
226         [ <uid> '.' <gid> '@' ] <hostip> ':' <path>
227
228       The first optional part of the string allows you
229       to specify the credentials to be used for all
230       subsequent transactions with this server. If the
231       string is omitted, the EUID/EGID of the executing
232       thread (i.e. the thread performing the 'mount' -
233       NFS will still 'remember' these values and use them
234       for all future communication with this server).
235       
236       The <hostip> part denotes the server IP address
237       in standard 'dot' notation. It is followed by
238       a colon and the (absolute) path on the server.
239       Note that no extra characters or whitespace must
240       be present in the string. Example 'device' strings
241       are:
242
243         "300.99@192.168.44.3:/remote/rtems/root"
244
245         "192.168.44.3:/remote/rtems/root"
246
247    o the 'mount_point' string identifies the local
248      directory (most probably on IMFS) where the NFS
249      is to be mounted. Note that the mount point must
250      already exist with proper permissions.
251
252 - Alternate 'mount' interface. NFS offers a more
253   convenient wrapper taking three string arguments:
254
255        nfsMount(uidgid_at_host, server_path, mount_point)
256
257   This interface does DNS lookup (see reentrancy note
258   below) and creates the mount point if necessary.
259   
260   o the first argument specifies the server and
261     optionally the uid/gid to be used for authentication.
262     The semantics are exactly as described above:
263
264       [ <uid> '.' <gid> '@' ] <host>
265     
266     The <host> part may be either a host _name_ or
267     an IP address in 'dot' notation. In the former
268     case, nfsMount() uses 'gethostbyname()' to do
269     a DNS lookup.
270
271     IMPORTANT NOTE: gethostbyname() is NOT reentrant/
272     thread-safe and 'nfsMount()' (if not provided with an
273     IP/dot address string) is hence subject to race conditions.
274 
275   o the 'server_path' and 'mount_point' arguments
276     are described above.
277     NOTE: If the mount point does not exist yet,
278           nfsMount() tries to create it.
279
280   o if nfsMount() is called with a NULL 'uidgid_at_host'
281     argument, it lists all currently mounted NFS
282
2833) Unmounting
284- - - - - - -
285An NFS can be unmounted using RTEMS 'unmount()'
286call (yep, it is unmount() - not umount()):
287
288  unmount(mount_point)
289
290Note that you _must_ supply the mount point (string
291argument). It is _not_ possible to specify the
292'mountee' when unmounting. NFS implements no
293convenience wrapper for this (yet), essentially because
294(although this sounds unbelievable) it is non-trivial
295to lookup the path leading to an RTEMS filesystem
296directory node.
297
2984) Unloading
299- - - - - - -
300After unmounting all NFS from the system, the NFS
301and RPCIOD modules may be stopped and unloaded.
302Just call 'nfsCleanup()' and 'rpcUdpCleanup()'
303in this order. You should evaluate the return value
304of these routines which is non-zero if either
305of them refuses to yield (e.g. because there are
306still mounted filesystems).
307Again, when unloading is done by CEXP this is
308transparently handled.
309
3105) Dumping Information / Statistics
311- - - - - - - - - - - - - - - - - -
312
313Rudimentary RPCIOD statistics are printed
314to a file (stdout when NULL) by
315
316  int rpcUdpStats(FILE *f)
317
318A list of all currently mounted NFS can be
319printed to a file (stdout if NULL) using
320
321  int nfsMountsShow(FILE *f)
322
323For convenience, this routine is also called
324by nfsMount() when supplying NULL arguments.
325
326III Implementation Details
327--------------------------
328
3291) RPCIOD
330- - - - -
331
332RPCIOD was created to
333
334a) avoid non-reentrant librpc calls.
335b) support 'asynchronous' operation over a single
336   socket.
337
338RPCIOD is a daemon thread handling 'transaction objects'
339(XACTs) through an UDP socket.  XACTs are marshalled RPC
340calls/replies associated with RPC servers and requestor
341threads.
342
343requestor thread:                 network:
344
345       XACT                        packet 
346        |                            |
347        V                            V
348  | message queue |              ( socket )
349        |                            |  ^
350        ---------->          <-----  |  |
351                     RPCIOD             |
352                   /       --------------
353           timeout/         (re) transmission
354                         
355
356A requestor thread drops a transaction into
357the message queue and goes to sleep.  The XACT is
358picked up by rpciod who is listening for events from
359three sources:
360
361  o the request queue
362  o packet arrival at the socket
363  o timeouts
364
365RPCIOD sends the XACT to its destination server and
366enqueues the pending XACT into an ordered list of
367outstanding transactions.
368
369When a packet arrives, RPCIOD (based on the RPC transaction
370ID) looks up the matching XACT and wakes up the requestor
371who can then XDR-decode the RPC results found in the XACT
372object's buffer.
373
374When a timeout expires, RPCIOD examines the outstanding
375XACT that is responsible for the timeout. If its lifetime
376has not expired yet, RPCIOD resends the request. Otherwise,
377the XACT's error status is set and the requestor is woken up.
378
379RPCIOD dynamically adjusts the retransmission intervals
380based on the average round-trip time measured (on a per-server
381basis).
382
383Having the requestors event driven (rather than blocking
384e.g. on a semaphore) is geared to having many different
385requestors (one synchronization object per requestor would
386be needed otherwise).
387
388Requestors who want to do asynchronous IO need a different
389interface which will be added in the future.
390
3911.a) Reentrancy
392- - - - - - - -
393RPCIOD does no non-reentrant librpc calls.
394
3951.b) Efficiency
396- - - - - - - -
397We shouldn't bother about efficiency until pipelining (read-ahead/
398delayed write) and caching are implemented. The round-trip delay
399associated with every single RPC transaction clearly is a big
400performance killer.
401
402Nevertheless, I could not withstand the temptation to eliminate
403the extra copy step involved with socket IO:
404
405A user data object has to be XDR encoded into a buffer. The
406buffer given to the socket where it is copied into MBUFs.
407(The network chip driver might even do more copying).
408
409Likewise, on reception 'recvfrom' copies MBUFS into a user
410buffer which is XDR decoded into the final user data object.
411
412Eliminating the copying into (possibly multiple) MBUFS by
413'sendto()' is actually a piece of cake. RPCIOD uses the
414'sosend()' routine [properly wrapped] supplying a single
415MBUF header who directly points to the marshalled buffer
416:-)
417
418Getting rid of the extra copy on reception was (only a little)
419harder: I derived a 'XDR-mbuf' stream from SUN's xdr_mem which
420allows for XDR-decoding out of a MBUF chain who is obtained by
421soreceive().
422
4232) NFS
424- - - -
425The actual NFS implementation is straightforward and essentially
426'passive' (no threads created). Any RTEMS task executing a
427filesystem call dispatched to NFS (such as 'opendir()', 'lseek()'
428or 'unlink()') ends up XDR encoding arguments, dropping a
429XACT into RPCIOD's message queue and going to sleep.
430When woken up by RPCIOD, the XACT is decoded (using the XDR-mbuf
431stream mentioned above) and the properly cooked-up results are
432returned.
433
4343) RTEMS Resources Used By NFS/RPCIOD
435- - - - - - - - - - - - - - - - - - -
436
437The RPCIOD/NFS package uses the following resources. Some
438parameters are compile-time configurable - consult the
439source files for details.
440
441RPCIOD:
442 o 1 task
443 o 1 message queue
444 o 1 socket/filedescriptor
445 o 2 semaphores (a third one is temporarily created during
446   rpcUdpCleanup()).
447 o 1 RTEMS EVENT (by default RTEMS_EVENT_30).
448   IMPORTANT: this event is used by _every_ thread executing
449              NFS system calls and hence is RESERVED.
450 o 3 events only used by RPCIOD itself, i.e. these must not
451   be sent to RPCIOD by no other thread (except for the intended
452   use, of course). The events involved are 1,2,3.
453 o preemption disabled sections:      NONE
454 o sections with interrupts disabled: NONE
455 o NO 'timers' are used (timer code would run in IRQ context)
456 o memory usage: n.a
457
458NFS:
459 o 2 message queues
460 o 2 semaphores
461 o 1 semaphore per mounted NFS
462 o 1 slot in driver entry table (for major number)
463 o preemption disabled sections:      NONE
464 o sections with interrupts disabled: NONE
465 o 1 task + 1 semaphore temporarily created when
466   listing mounted filesystems (rtems_filesystem_resolve_location())
467
4684) CAVEATS & BUGS
469- - - - - - - - -
470Unfortunately, some bugs crawl around in the filesystem generics.
471(Some of them might already be fixed in versions later than
472rtems-ss-20020301).
473I recommend to use the patch distributed with RTEMS-NFS.
474
475 o RTEMS uses/used (Joel said it has been fixed already) a 'short'
476   ino_t which is not enough for NFS.
477   The driver detects this problem and enables a workaround. In rare
478   situations (mainly involving 'getcwd()' improper inode comparison
479   may result (due to the restricted size, stat() returns st_ino modulo
480   2^16). In most cases, however, st_dev is compared along with st_ino
481   which will give correct results (different files may yield identical
482   st_ino but they will have different st_dev). However, there is
483   code (in getcwd(), for example) who assumes that files residing
484   in one directory must be hosted by the same device and hence omits
485   the st_dev comparison. In such a case, the workaround will fail.
486 
487   NOTE: changing the size (sys/types.h) of ino_t from 'short' to 'long'
488         is strongly recommended. It is NOT included in the patch, however
489         as this is a major change requiring ALL of your sources to
490         be recompiled.
491
492   THE ino_t SIZE IS FIXED IN GCC-3.2/NEWLIB-1.10.0-2 DISTRIBUTED BY
493   OAR.
494
495 o You may work around most filesystem bugs by observing the following
496   rules:
497
498    * never use chroot() (fixed by the patch)
499    * never use getpwent(), getgrent() & friends - they are NOT THREAD
500      safe (fixed by the patch)
501    * NEVER use rtems_libio_share_private_env() - not even with the
502      patch applied. Just DONT - it is broken by design.
503    * All threads who have their own userenv (who have called
504      rtems_libio_set_private_env()) SHOULD 'chdir("/")' before
505      terminating. Otherwise, (i.e. if their cwd is on NFS), it will
506      be impossible to unmount the NFS involved.
507
508 o The patch slightly changes the semantics of 'getpwent()' and
509   'getgrent()' & friends (to what is IMHO correct anyways - the patch is
510   also needed to fix another problem, however): with the patch applied,
511   the passwd and group files are always accessed from the 'current' user
512   environment, i.e. a thread who has changed its 'root' or 'uid' might
513   not be able to access these files anymore.
514     
515 o NOTE: RTEMS 'mount()' / 'unmount()' are NOT THREAD SAFE.
516
517 o The NFS protocol has no 'append' or 'seek_end' primitive. The client
518   must query the current file size (this client uses cached info) and
519   change the local file pointer accordingly (in 'O_APPEND' mode).
520   Obviously, this involves a race condition and hence multiple clients
521   writing the same file may lead to corruption.
522
523IV Licensing & Disclaimers
524--------------------------
525
526NFS is distributed under the SLAC License - consult the
527separate 'LICENSE' file.
528
529Government disclaimer of liability
530- - - - - - - - - - - - - - - - -
531Neither the United States nor the United States Department of Energy,
532nor any of their employees, makes any warranty, express or implied,
533or assumes any legal liability or responsibility for the accuracy,
534completeness, or usefulness of any data, apparatus, product, or process
535disclosed, or represents that its use would not infringe privately
536owned rights.
537
538Stanford disclaimer of liability
539- - - - - - - - - - - - - - - - -
540Stanford University makes no representations or warranties, express or
541implied, nor assumes any liability for the use of this software.
542
543Maintenance of notice
544- - - - - - - - - - -
545In the interest of clarity regarding the origin and status of this
546software, Stanford University requests that any recipient of it maintain
547this notice affixed to any distribution by the recipient that contains a
548copy or derivative of this software.
Note: See TracBrowser for help on using the repository browser.