OpenBSD Kernel Internals — Creation of process from user-space to kernel space.

GDB + Qemu (env)

Hello readers,

I know this time it is a little late, but I am also busy with some other professional things. 🙂

This time let’s discuss about the process creation in OpenBSD operating system from user-space level to kernel space.

We will take an example of the user-space process that will be launched from the Command Line Interface (console), for example, “ls”, and then what happens in kernel-space as a result of it.

I will divide this series into 3 parts, like 

creation

execution

exit

, because the creation of process itself took some amount of time for me to learn, and analyzing or tracking from user-space to kernel-space had to be done line by line.

I have used gdb to debug the process and analyze it line by line.

Now, I will not waste your time too much.

Let’s dive into the user-space to kernel-space and learn and see the beauty of puffer.

I have divided the full process and functions that are used in the kernel into the points, so, I think it will be easy to read and learn.

Now, suppose you have launched “ls” command from CLI (xterm):

Here, the 

parent

 process is “ksh”, that is, default shell in OpenBSD which invokes “ls” command or any other command.

Every process is created by 

sys_fork()

 , that is, fork system call which is indirectly (internally) calls 

<a class="markup--anchor markup--p-anchor" href="https://man.openbsd.org/fork1.9" target="_blank" rel="nofollow noopener" data-href="https://man.openbsd.org/fork1.9">fork1()</a>

fork1 — kernel developer’s manual

fork1

() creates a new process out of p1, which should be the current thread. This function is used primarily to implement the fork(2) and vfork(2) system calls, as well as the kthread_create(9) function.

Life cycle of a process (in brief):

“ls” → fork(2) → sys_fork() → fork1() → sys_execve() → sys_exit() → exit1()

Under the hood working of 

fork1

()

After “ls” from user-space it goes to 

fork

() (libc) then from there to 

sys_fork

().

sys_fork()

FORK_FORK

: It is a macro which defines that the call is done by the fork(2)system call. Used only for statistics.

#define FORK_FORK 0x00000001

  • So, the value of 
    flags

     variable is set to 

    1

     , because the call is done by 

    fork(2)

    .

  • check for PTRACING then update the 
    flags

     with 

    PTRACE_FORK

     else leave it and return to the 

    fork1()

Now, 

fork1()

fork1() initial code
  • The above code includes, 
    curp-&gt;p_p-&gt;ps_comm

     is “ksh”, that is, parent process which will fork “ls” (user-space).

  • Initially some process structures, then, setting
    uid = curp-&gt;p_ucred-&gt;cr_ruid

     , it means setting the uid as real user id.

  • Then, the structure for process address space information.
  • Then, some variables and 
    ptrace_state

     structure and then the condition checking using 

    KASSERT

    .

  • fork_check_maxthread(uid)

     → it is used to the check or track the number of threads invoked by the specific 

    uid

     .

  • It checks the number of threads invoked by specific 
    uid

     shouldn’t be greater than the number of maximum threads allowed or also for 

    maxthread —5

     . Because the last 5 process from the 

    maxthread

     is reserved for the root.

  • If it is greater than defined 
    maxthread

     or 

    maxthread — 5

    , it will print the message

    tablefull

    once every 10 seconds. Else, it will increment the number of threads.

fork_check_maxthread(uid)
  • Now, after 
    fork_check_thread

    , again, the same implementation happens for tracking process. If you want you can have a look in our 

    fork1

     code screen-shot.

Now, we will proceed further,

fork1() code continued
  • It is changing the count of threads for a specific user via 
    chgproccnt(uid,1)

    .

chgproccnt()
  • uidinfo

     structure maintains every uid resource consumption counts, including the process count and socket buffer space usage.

  • uid_find

     function looks up and returns the 

    uidinfo

     structure for 

    <em class="markup--em markup--li-em">uid</em>

    . If no 

    uidinfo

     structure exists for 

    <em class="markup--em markup--li-em">uid</em>

    , a new structure will be allocated and initialized.

Then, it increments the 

ui_proccnt

 , that is, number of processes by 

diff

and then returns count.

After, that, it is checks for the non-privileged 

uid

 and also that the number of process is greater than the soft limit of resources, that is, 

<em class="markup--em markup--p-em">9223372036854775807</em>

, from what I have found in gdb.

Have a look in the below screen-shot for the proper view of values:

(ddd) gdb output for resource limit

If non-privileged is allowed and the count is increased by the maximum resource limit, it will decrease the count via 

chgproccnt()

 by passing 

-1

 as 

diff

 parameter and also decrease the number of processes and threads.

  • Next, the 
    <a class="markup--anchor markup--li-anchor" href="https://man.openbsd.org/uvm.9" target="_blank" rel="nofollow noopener" data-href="https://man.openbsd.org/uvm.9">uvm_uarea_alloc</a>()

     function allocates a thread’s ‘uarea’, the memory where its kernel stack and PCB are stored.

Now, it checks if the 

uaddr

 variable doesn’t contain any thread’s address, if it is zero, then it decrements the count of the number of process and thread.

Now, there are the some important functions:

→ 

thread_new(struct proc *parent, vaddr_t uaddr)

→ 

process_new(struct proc *p, struct process *parent, int flags)

thread_new(curp, uaddr)

Here, in the 

thread_new

 function, we will get our user-space process, that is, in our case “ls”. The process gets retrieved from the pool of process, that is, 

proc_pool

 via 

pool_get()

 function.

Then, we set the state of the thread to be 

SIDL

 , which means that the process/thread is being created by 

fork

 . We then set

p →p_flag = 0.

Now, they are zeroing the section of 

proc

 . See, the below code snippet from 

sys/proc.h

code snippet for members that will be zeroed upon creation in fork, via memset

In above code snippet, all the variables will be zeroed via 

memset

 upon creation in the fork.

Then, they are copying the section from 

parent→p_startcopy

 to

p→p_startcopy

via 

memcpy

. Have a look below in the screen-shot to know which of the field members will be copied.

code snippet for the members those will be copied upon in fork
  • The, 
    crhold(p-&gt;p_ucred)

     means it will increment the reference count in 

    struct ucred

     structure, that is, 

    p-&gt;p_ucred-&gt;cr_ref++

     .

  • Now, typecast the thread’s addr, that is, 
    (struct user *)uaddr

     and save it in kernel’s virtual addr of u-area.

  • Now, it will initialize the timeout.

dummy function to show the 

timeout_set

 function working.

timeout_set(timeout, b, argument)

It means initialize the 

timeout

 struture and call the function 

b

 with 

argument

 .

void
timeout_set(struct timeout *new, void (*fn)(void *), void *arg)
{
        new->to_func = fn;
        new->to_arg = arg;
        new->to_flags = TIMEOUT_INITIALIZED;
}

scheduler_fork_hook(parent, p): It is a macro which will update the 

p_estcpu

 of child from parent’s 

p_estcpu

.

p_estcpu

 holds an estimate of the amount of CPU that the process has used recently

/* Inherit the parent’s scheduler history */
#define scheduler_fork_hook(parent, child) do {    \
 (child)->p_estcpu = (parent)->p_estcpu;           \
} while (0)

Then, return the newly created thread 

p

 .

Now, another important function is 

process_new()

 which will create the process in a similar fashion to what we have seen above in the 

thread_new

func.

  • process_new(struct proc *p, struct process *parent, int flags)
process_new(p,curpr,flags)

In above code snippet, the same thing is happening again like select process from 

process_pool

 via 

pool_get

 then zeroing using 

memset

 and copying using 

memcpy

.

So, for the detailed explanation, please go through the 

thread_new()

 function first.

Next is initialization of process using 

process_initialize

 function.

process_initialize(pr, p)

ps_mainproc

 : It is the original and main thread in the process. It’s only special for the handling of 

p_xstat

 and some signal and ptrace behaviours that need to be fixed.

→Copy initial thread, that is, 

p

 to 

pr-&gt;mainproc

 .

→Initialize the queue with referenced by head. Here, head is 

pr→ps_threads

. Then, Insert 

elm

 at the TAIL of the queue. Here, elm is 

p

 .

→set the number of references to 

1

, that is, 

pr-&gt;ps_refcnt = 1

→copy the process 

pr

 to the process of initial thread.

→set the same creds for process as the initial thread.

→condition check for the new thread and the new process via 

KASSERT

.

→Initialize the List referenced by head. Here, head is 

pr-&gt;ps_children

→Again, initialize timeout. (for detail, see 

thead_new

)

Now, after the process initialization, pid allocation takes place.

ps→ps_pid = allocpid();

 

allocpid()

 returns unused pid

allocpid()

 internally calls the 

arc4random_uniform()

 which again calls the 

arc4random()

 then via 

arc4random()

 a fully randomized number is returned which is used as pid.

Then, for the availability of pid, or in other words, for unused pid, it verifies that whether the new pid is already taken or not by any process. It verifies this one by one in the process, process groups, and zombie process by using function 

ispidtaken(pid_t pid) 

which internally calls these functions:

  • prfind(pid_t pid) : Locate a process by number
  • pgfind(pid_t pgid) : Locate a process group by number
  • zombiefind(pid_t pid :Locate a zombie process by number
code snippet for allocpid and ispidtaken

Now, store the pointer to parent process in 

pr→ps_pptr

 .

Increment the number of references count in process limit structure, that is, 

struct plimit

 .

Store the vnode of executable of parent into 

pr→ps_textvp

 ,that is, 

pr→ps_textvp = parent→ps_textvp;

 .

if (pr→ps_textvp)
        vref(pr→ps_textvp); /* vref --> vnode reference */

Above code snippet means, if valid vnode found then increment the 

v_usecount++

 variable inside the 

struct vnode

 structure of the executable.

Now, the calculation for setting up process flags:

pr→ps_flags = parent →ps_flags & (PS_SUGID | PS_SUGIDEXEC | PS_PLEDGE | PS_EXECPLEDGE | PS_WXNEEDED);
pr →ps_flags = parent →ps_flags & (0x10 | 0x20 | 0x100000 | 0x400000 | 0x200000)
if (vnode of controlling terminal != NULL)

 
pr→ps_flags |= parent→ps_flags & PS_CONTROLT;

process_new

 continued…

process_new continued…

Checks:

* if child_able_to_share_file_descriptor_table_with_parent:
         pr->ps_fd = fdshare(parent)      /* share the table */
  else
         pr->ps_fd = fdcopy(parent)       /* copy the table */
* if child_able_to_share_the_parent's_signal_actions:
         pr->ps_sigacts = sigactsshare(parent) /* share */
  else
         pr->ps_sigacts = sigactsinit(parent)  /* copy */
* if child_able_to_share_the_parent's addr space:
         pr->ps_vmspace = uvmspace_share(parent)
  else
         pr->ps_vmspace = uvmspace_fork(parent)
* if process_able_to_start_profiling:
         smartprofclock(pr);    /* start profiling on a process */
* if check_child_able_to_start_ptracing:
         pr->ps_flags |= parent->ps_flags & PS_PTRACED
* if check_no_signal_or_zombie_at_exit:
         pr->ps_flags |= PS_NOZOMBIE /*No signal or zombie at exit
* if check_signals_stat_swaping:
         pr->ps_flags |= PS_SYSTEM

update the 

pr→ps_flags

 with PS_EMBRYO by ORing it, that is,

pr→ps_flags |= PS_EMBRYO

 /* New process, not yet fledged */

membar_producer()

 → Force visibility of all of the above changes.

— All stores preceding the memory barrier will reach global visibility before any stores after the memory barrier reach global visibility.

In short, I think it is used to forcefully make visible changes globally.

Now, Insert the new 

elm

, that is, 

pr

 at the head of the list. Here, head is 

allprocess

 .

  • return 
    pr
fork1()

 continued…

fork1() continued…

Substructures

p→p_fd

 and 

p→p_vmspace

 directly copy of 

pr→ps_fd

 and 

pr→ps_vmspace

.

substructures

checks,

** if (process_has_no_signals_stats_or_swapping) then atomically set bits.

atomic_setbits_int(pr →ps_flags, PS_SYSTEM);

** if (child_is_suspending_the_parent_process_until_the_child_is terminated (by calling _exit(2) or abnormally), or makes a call to execve(2)) then atomically set bits,

atomic_setbits_int(pr →ps_flags, PS_PPWAIT);
atomic_setbits_int(pr →ps_flags, PS_ISPWAIT);

#ifdef KTRACE
/* Some KTRACE related things */
#endif

cpu_fork(curp, p, NULL, NULL, func, arg ?arg: p)

— To create or Update PCB and make child ready to RUN.

/*
 * Finish creating the child thread. cpu_fork() will copy
 * and update the pcb and make the child ready to run. The
 * child will exit directly to user mode via child_return()
 * on its first time slice and will not return here.
 */

Address space,

vm = pr→ps_vmspace

if (call is done by fork syscall); then
increment the number of fork() system calls.
update the vm_pages affected by fork() syscall with addition of data page and stack page.
else if (call is done by vfork() syscall); then
do as same as if it was fork syscall but for vfork system call. (see above if {for fork})
else
increment the number of kernel threads created.

Check,

If (process is being traced && created by fork system call);then
{
        The malloc() function allocates the uninitialized memory in the kernel address space for an object whose size is specified by size, that is, here, sizeof(*newptstat). And,

<em class="markup--em markup--pre-em">struct ptrace_state *newptstat</em>
}

allocate thread ID, that is, 

p→p_tid = alloctid();

This is also the same calling 

arc4random

 directly and using 

tfind

 function for finding the thread ID by number.

* inserts the new element p at the head	of the allprocess list.
* insert the new element p at the head of the thread hash list.
* insert the new element pr at the head of the process hash list.
* insert the new element pr after the curpr element.
* insert the new element pr at the head of the children process  list.
fork1()

 continued…

fork1 continued…

Again,
If (isProcessPTRACED())
{
then save the parent process id during ptracing, that is,

pr→ps_oppid = curpr→ps_pid

 .
If (pointer to parent process_of_child != pointer to parent process_of_current_process)
{
proc_reparent(pr, curpr→ps_pptr); /* Make current process the new parent of process child, that is, 

pr

*/

Now, check whether 

newptstat 

contains some address, in our case, 

newptstat

 contains a kernel virtual address returned by 

malloc(9

.
If above condition is 

True

, that is, 

newptstat != NULL

 . Then, set the ptrace status:
Set 

newptstat

 point to the ptrace state structure. Then, make the 

newptstat

point to 

NULL

 .

→Update the ptrace status to the 

curpr

 process and also the 

pr

 process.

curpr->ps_ptstat->pe_report_event = PTRACE_FORK;
pr->ps_ptstat->pe_report_event = PTRACE_FORK;
curpr->ps_ptstat->pe_other_pid = pr->ps_pid;
pr->ps_ptstat->pe_other_pid = curpr->ps_pid;

Now, for the new process set accounting bits and mark it as complete.

  • get the nano time to start the process.
  • Set accounting flags to 
    AFORK

     which means forked but not execed.

  • atomically clear the bits.
  • Then, check for the new child is in the IDLE state or not, if yes then make it runnable and add it to the run queue by 
    fork_thread_start

     function.

  • If it is not in the IDLE state then put 
    arg

     to the current CPU, running on.

Freeing the memory or kernel virtual address that is allocated by malloc for 

newptstat

 via 

free

 .

Notify any interested parties about the new process via 

KNOTE

 .

Now, update the stats counter for successfully forked.

uvmexp.forks++; /* -->For forks */
if (flags & FORK_PPWAIT)
        uvmexp.forks_ppwait++; /* --> counter for forks where parent waits */
if (flags & FORK_SHAREVM)
        uvmexp.forks_sharevm++; /* --> counter for forks where vmspace is shared */

Now, pass pointer to the new process to the caller.

if (rnewprocp != NULL)
        *rnewprocp = p;
fork1 continued…
  • setting the 
    PPWAIT

     on child and the 

    PS_ISPWAIT

     on ourselves, that is, the parent and then go to the sleep on our process via 

    tsleep

     .

  • Check, If the child is started with tracing enables && the current process is being traced then alert the parent by using 
    SIGTRAP

     signal.

  • Now, return the child pid to the parent process.
  • <strong class="markup--strong markup--li-strong"><em class="markup--em markup--li-em">return (0)</em></strong>

Then, finally, I have seen in the debugger that after the 

fork1

, it jumps to 

sys/arch/amd64/amd64/trap.c

 file for system call handling and for the setting frame.

Some of the machine independent (MI) functions defined in 

sys/sys/syscall_mi.h

 file, like, 

mi_syscall()

mi_syscall_return()

 and 

mi_child_return()

.

Then, after handling the system calls from 

trap.c

 then, control pass to the 

sys_execve

 system call, which I will explain later (in the second part) and also I will explain more about the 

trap.c 

code in upcoming posts. It has already become a long post.

References:

РубрикиБез рубрики

Добавить комментарий

Ваш адрес email не будет опубликован. Обязательные поля помечены *