#3175 closed enhancement (fixed)

Merge FreeBSD timecounter changes from 2015-01-20 to now

Reported by: Sebastian Huber Owned by: Sebastian Huber
Priority: normal Milestone: 5.1
Component: score Version: 5
Severity: normal Keywords:
Cc: Blocked By: #3557


Change History (29)

comment:1 Changed on 10/12/17 at 05:05:59 by Hans Petter Selasky <hselasky@…>

In ea0b339/rtems:

timecounter: Merge FreeBSD change r279728

Add mutex support to the pps_ioctl() API in the kernel. Bump kernel version to reflect structure change.

PR: 196897
MFC after: 1 week

Update #3175.

comment:2 Changed on 10/12/17 at 05:06:10 by Ian Lepore <ian@…>

In 0aef6fb/rtems:

timecounter: Merge FreeBSD change r280012

Use sbuf_printf() for sysctl strings instead of stack buffers and snprintf().

Update #3175.

comment:3 Changed on 10/12/17 at 05:06:22 by Ian Lepore <ian@…>

In 51304dde/rtems:

timecounter: Merge FreeBSD change r282424

Implement a mechanism for making changes in the kernel<->driver PPS interface without breaking ABI or API compatibility with existing drivers.

The existing data structures used to communicate between the kernel and
driver portions of PPS processing contain no spare/padding fields and no
flags field or other straightforward mechanism for communicating changes
in the structures or behaviors of the code. This makes it difficult to
MFC new features added to the PPS facility. ABI compatibility is
important; out-of-tree drivers in module form are known to exist. (Note
that the existing api_version field in the pps_params structure must
contain the value mandated by RFC 2783 and any RFCs that come along after.)

These changes introduce a pair of abi-version fields which are filled in
by the driver and the kernel respectively to indicate the interface
version. The driver sets its version field before calling the new
pps_init_abi() function. That lets the kernel know how much of the
pps_state structure is understood by the driver and it can avoid using
newer fields at the end of the structure that it knows about if the driver
is a lower version. The kernel fills in its version field during the init
call, letting the driver know what features and data the kernel supports.

To implement the new version information in a way that is backwards
compatible with code from before these changes, the high bit of the
lightly-used 'kcmode' field is repurposed as a flag bit that indicates the
driver is aware of the abi versioning scheme. Basically if this bit is
clear that indicates a "version 0" driver and if it is set the driver_abi
field indicates the version.

These changes also move the recently-added 'mtx' field of pps_state from
the middle to the end of the structure, and make the kernel code that uses
this field conditional on the driver being abi version 1 or higher. It
changes the only driver currently supplying the mtx field, usb_serial, to
use pps_init_abi().

Reviewed by: hselasky@

Update #3175.

comment:4 Changed on 10/12/17 at 05:06:34 by Konstantin Belousov <kib@…>

In 4d0ade9/rtems:

timecounter: Merge FreeBSD change r284256

Tweaks for r284178:

Do not include machine/atomic.h explicitely, the header is already included
by sys/systm.h.

Force inlining of tc_getgen() and tc_setgen(). The functions are used
more than once, which causes compilers with non-aggressive inlining
policies to generate calls.

Suggested by: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 week

Update #3175.

comment:5 Changed on 10/12/17 at 05:06:46 by Konstantin Belousov <kib@…>

In 0163063/rtems:

timecounter: Merge FreeBSD change r285286

Reimplement the ordering requirements for the timehands updates, and for timehands consumers, by using fences.

Ensure that the timehands->th_generation reset to zero is visible
before the data update is visible [*]. tc_setget() allowed data update
writes to become visible before generation (but not on TSO

Remove tc_setgen(), tc_getgen() helpers, use atomics inline [].

Noted by: alc [*]
Requested by: bde []
Reviewed by: alc, bde
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks

Update #3175.

comment:6 Changed on 10/12/17 at 05:06:58 by Ian Lepore <ian@…>

In ec349b58/rtems:

timecounter: Merge FreeBSD change r286423

RFC 2783 requires a status of ETIMEDOUT, not EWOULDBLOCK, on a timeout.

Update #3175.

comment:7 Changed on 10/12/17 at 05:07:09 by Ian Lepore <ian@…>

In 7494681/rtems:

timecounter: Merge FreeBSD change r286429

Only process the PPS event types currently enabled in pps_params.mode.

This makes the PPS API behave correctly, but isn't ideal -- we still end
up capturing PPS data for non-enabled edges, we just don't process the
data into an event that becomes visible outside of kern_tc. That's because
the event type isn't passed to pps_capture(), so it can't do the filtering.
Any solution for capture filtering is going to require touching every driver.

Update #3175.

comment:8 Changed on 10/12/17 at 05:07:21 by Ian Lepore <ian@…>

In f1463c8/rtems:

timecounter: Merge FreeBSD change r286701

If a specific timecounter has been chosen via sysctl, and a new timecounter with higher quality registers (presumably in a module that has just been loaded), do not undo the user's choice by switching to the new timecounter.

Document that behavior, and also the fact that there is no way to unregister
a timecounter (and thus no way to unload a module containing one).

Update #3175.

comment:9 Changed on 10/12/17 at 05:07:33 by Ian Lepore <ian@…>

In 2b6d00f/rtems:

timecounter: Merge FreeBSD change r304285

Constify the pointers to eventtimer and timecounter name strings.

The need for this appears as soon as you try to set the names to something
that isn't a "quoted literal". (I'm actually confused why quoted strings
aren't a problem as well, we must have some warning disabled.)

Update #3175.

comment:10 Changed on 10/12/17 at 05:07:44 by Konstantin Belousov <kib@…>

In f013c14/rtems:

timecounter: Merge FreeBSD change r288216

Use per-cpu values for base and last in tc_cpu_ticks(). The values are updated lockess, different CPUs write its own view of timecounter state. The critical section is done for safety, callers of tc_cpu_ticks() are supposed to already enter critical section, or to own a spinlock.

The change fixes sporadical reports of too high values reported for
the (W)CPU on platforms that do not provide cpu ticker and use
tc_cpu_ticks(), in particular, arm*.

Diagnosed and reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week

Update #3175.

comment:11 Changed on 10/12/17 at 05:07:56 by Ngie Cooper <ngie@…>

In 4cd742e/rtems:

timecounter: Merge FreeBSD change r290257

Define fhard in pps_event(..) only when PPS_SYNC is defined to mute an -Wunused-but-set-variable warning

Reported by: FreeBSD_HEAD_amd64_gcc4.9 jenkins job
Sponsored by: EMC / Isilon Storage Division

Update #3175.

comment:12 Changed on 10/12/17 at 05:08:08 by Pedro Giffuni <pfg@…>

In 65f2cd7a/rtems:

timecounter: Merge FreeBSD change r298819

sys/kern: spelling fixes in comments.

No functional change.

Update #3175.

comment:13 Changed on 10/12/17 at 05:08:20 by Pedro Giffuni <pfg@…>

In f6c94601/rtems:

timecounter: Merge FreeBSD change r298981

sys/sys: minor spelling fixes.

While the changes are minor, these headers are very visible.

MFC after: 2 weeks

Update #3175.

comment:14 Changed on 10/12/17 at 05:08:32 by Konstantin Belousov <kib@…>

In d310aa7/rtems:

timecounter: Merge FreeBSD change r303382

Hide the boottime and bootimebin globals, provide the getboottime(9) and getboottimebin(9) KPI. Change consumers of boottime to use the KPI. The variables were renamed to avoid shadowing issues with local variables of the same name.

Issue is that boottime* should be adjusted from tc_windup(), which
requires them to be members of the timehands structure. As a
preparation, this commit only introduces the interface.

Some uses of boottime were found doubtful, e.g. NLM uses boottime to
identify the system boot instance. Arguably the identity should not
change on the leap second adjustment, but the commit is about the
timekeeping code and the consumers were kept bug-to-bug compatible.

Tested by: pho (as part of the bigger patch)
Reviewed by: jhb (same)
Discussed with: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
X-Differential revision: https://reviews.freebsd.org/D7302

Update #3175.

comment:15 Changed on 10/12/17 at 05:08:44 by Konstantin Belousov <kib@…>

In 6d3c125/rtems:

timecounter: Merge FreeBSD change r303383

Reduce number of timehands to just two. This is useful because consumers can now be only one tc_windup() call late.

Use C99 initialization.

Tested by: pho (as part of the whole patch)
Reviewed by: jhb (same)
Discussed with: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
X-Differential revision: https://reviews.freebsd.org/D7302

Update #3175.

comment:16 Changed on 10/12/17 at 05:08:56 by Konstantin Belousov <kib@…>

In 464fd5d/rtems:

timecounter: Merge FreeBSD change r303384


Sponsored by: The FreeBSD Foundation
MFC after: 1 month
X-Differential revision: https://reviews.freebsd.org/D7302

Update #3175.

comment:17 Changed on 10/12/17 at 05:09:08 by Konstantin Belousov <kib@…>

In b48aeaf/rtems:

timecounter: Merge FreeBSD change r303387

Prevent parallel tc_windup() calls, both parallel top-level calls from setclock() and from simultaneous top-level and interrupt. For this, tc_windup() is protected with a tc_setclock_mtx spinlock, in the try mode when called from hardclock interrupt. If spinlock cannot be obtained without spinning from the interrupt context, this means that top-level executes tc_windup() on other core and our try may be avoided.

The boottimebin and boottime variables should be adjusted from
tc_windup(). To be correct, they must be part of the timehands and
read using lockless protocol. Remove the globals and reimplement the
getboottime(9)/getboottimebin(9) KPI using the timehands read

Tested by: pho (as part of the whole patch)
Reviewed by: jhb (same)
Discussed wit: bde
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
X-Differential revision: https://reviews.freebsd.org/D7302

Update #3175.

comment:18 Changed on 10/12/17 at 05:09:20 by Konstantin Belousov <kib@…>

In c382cc83/rtems:

timecounter: Merge FreeBSD change r303548

Cache getbintime(9) answer in timehands, similarly to getnanotime(9) and getmicrotime(9).

Suggested and reviewed by: bde (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 month

Update #3175.

comment:19 Changed on 10/12/17 at 05:09:33 by Konstantin Belousov <kib@…>

In 7488715/rtems:

timecounter: Merge FreeBSD change r304285

Implement userspace gettimeofday(2) with HPET timecounter.

Right now, userspace (fast) gettimeofday(2) on x86 only works for
RDTSC. For older machines, like Core2, where RDTSC is not C2/C3
invariant, and which fall to HPET hardware, this means that the call
has both the penalty of the syscall and of the uncached hw behind the
QPI or PCIe connection to the sought bridge. Nothing can me done
against the access latency, but the syscall overhead can be removed.
System already provides mappable /dev/hpetX devices, which gives
straight access to the HPET registers page.

Add yet another algorithm to the x86 'vdso' timehands. Libc is updated
to handle both RDTSC and HPET. For HPET, the index of the hpet device
to mmap is passed from kernel to userspace, index might be changed and
libc invalidates its mapping as needed.

Remove cpu_fill_vdso_timehands() KPI, instead require that
timecounters which can be used from userspace, to provide
tc_fill_vdso_timehands{,32}() methods. Merge i386 and amd64
libc/<arch>/sys/vdso_gettc.c into one source file in the new
libc/x86/sys location.
vdso_gettc() internal interface is changed
to move timecounter algorithm detection into the MD code.

Measurements show that RDTSC even with the syscall overhead is faster
than userspace HPET access. But still, userspace HPET is three-four
times faster than syscall HPET on several Core2 and SandyBridge?

Tested by: Howard Su <howard0su@…>
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D7473

Update #3175.

comment:20 Changed on 10/12/17 at 05:09:45 by Ed Schouten <ed@…>

In a9219e7/rtems:

timecounter: Merge FreeBSD change r310053

Add labels to sysctls related to clocks.

Sysctls like kern.eventtimer.et.*.quality currently embed the name of
the clock device. This is problematic for the Prometheus metrics
exporter for two reasons:

  • Some of those clocks have dashes in their names, which Prometheus doesn't allow to be used in metric names.
  • It doesn't allow for extracting the same property of all clocks on the system from within a single query.

Attach these nodes to have a label, so that the Prometheus metrics
exporter gives these metric a uniform name with the name of the clock
attached as a label.

Reviewed by: cem
Differential Revision: https://reviews.freebsd.org/D8775

Update #3175.

comment:21 Changed on 10/12/17 at 05:09:57 by Eric van Gyzen <vangyzen@…>

In 952b42b6/rtems:

timecounter: Merge FreeBSD change r315280

When the RTC is adjusted, reevaluate absolute sleep times based on the RTC

POSIX 2008 says this about clock_settime(2):

If the value of the CLOCK_REALTIME clock is set via clock_settime(),
the new value of the clock shall be used to determine the time
of expiration for absolute time services based upon the
CLOCK_REALTIME clock. This applies to the time at which armed
absolute timers expire. If the absolute time requested at the
invocation of such a time service is before the new value of
the clock, the time service shall expire immediately as if the
clock had reached the requested time normally.

Setting the value of the CLOCK_REALTIME clock via clock_settime()
shall have no effect on threads that are blocked waiting for
a relative time service based upon this clock, including the
nanosleep() function; nor on the expiration of relative timers
based upon this clock. Consequently, these time services shall
expire when the requested relative interval elapses, independently
of the new or old value of the clock.

When the real-time clock is adjusted, such as by clock_settime(3),
wake any threads sleeping until an absolute real-clock time.
Such a sleep is indicated by a non-zero td_rtcgen. The sleep functions
will set that field to zero and return zero to tell the caller
to reevaluate its sleep duration based on the new value of the clock.

At present, this affects the following functions:


I'm working on adding clock_nanosleep(2), which will also be affected.

Reported by: Sebastian Huber <sebastian.huber@…>
Reviewed by: jhb, kib
MFC after: 2 weeks
Relnotes: yes
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D9791

Update #3175.

comment:22 Changed on 10/12/17 at 05:10:09 by Eric van Gyzen <vangyzen@…>

In 5167d0e/rtems:

timecounter: Merge FreeBSD change r315287

Add missing pieces of r315280

I moved this branch from github to a private server, and pulled from the
wrong one when committing r315280, so I failed to include two recent commits.
Thankfully, they were only cosmetic and were included in the review.

Add documentation, polish comments, and improve style(9).

Tested by: pho (r315280)
MFC after: 2 weeks
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D9791

Update #3175.

comment:23 Changed on 10/12/17 at 05:10:22 by Konstantin Belousov <kib@…>

In bcbbe76/rtems:

timecounter: Merge FreeBSD change r324528

The th_bintime, th_microtime and th_nanotime members of the timehand all cache the last system time (uptime + boottime). Only the format differs. Do not re-calculate the bintime and simply use the value used to calculate the microtime and nanotime.

Group all the updates under the relevant comment. Remove obsoleted
XXX part.

Submitted by: Sebastian Huber <sebastian.huber@…>
MFC after: 1 week

Update #3175.

comment:24 Changed on 10/12/17 at 05:10:35 by Sebastian Huber <sebastian.huber@…>

In d8b6f1c/rtems:

timecounter: Update FreeBSD identifiers

Update #3175.

comment:25 Changed on 11/09/17 at 06:27:14 by Sebastian Huber


Milestone renamed

comment:26 Changed on 12/05/17 at 14:28:46 by Joel Sherrill

Is this done and can be closed? With an appropriate comment.

comment:27 Changed on 10/13/18 at 22:46:08 by Joel Sherrill

Resolution: fixed
Status: assignedclosed

Closing since no feedback since Dec 2017

comment:28 Changed on 10/20/18 at 15:15:06 by Amar Takhar

Whoops, Joel closed this. I need to test the fix implemented in #3557 to make sure no developers that do not have an account on trac receive an email.

comment:29 Changed on 10/20/18 at 15:21:27 by Amar Takhar

Blocked By: 3557 added
Note: See TracTickets for help on using tickets.