Home - Waterfall Grid T-Grid Console Builders Recent Builds Buildslaves Changesources - JSON API - About

Console View

Legend:   Passed Failed Warnings Failed Again Running Exception Offline No data

Simon Sapin
dirstate-tree: Avoid BTreeMap double-lookup when inserting a dirstate entry

The child nodes of a given node in the tree-shaped dirstate are kept in a
`BTreeMap` where keys are file names as strings. Finding or inserting a value
in the map takes `O(log(n))` string comparisons, which adds up when constructing
the tree.

The `entry` API allows finding a "spot" in the map that may or may not be
occupied and then access that value or insert a new one without doing map
lookup again. However the current API is limited in that calling `entry`
requires an owned key (and so a memory allocation), even if it ends up not
being used in the case where the map already has a value with an equal key.

This is still a win, with 4% better end-to-end time for `hg status` measured
here with hyperfine:

Benchmark #1: ../hg2/hg status -R $REPO --config=experimental.dirstate-tree.in-memory=1
  Time (mean ± σ):      1.337 s ±  0.018 s    [User: 892.9 ms, System: 437.5 ms]
  Range (min … max):    1.316 s …  1.373 s    10 runs

Benchmark #2: ./hg status -R $REPO --config=experimental.dirstate-tree.in-memory=1
  Time (mean ± σ):      1.291 s ±  0.008 s    [User: 853.4 ms, System: 431.1 ms]
  Range (min … max):    1.283 s …  1.309 s    10 runs

  './hg status -R $REPO --config=experimental.dirstate-tree.in-memory=1' ran
    1.04 ± 0.02 times faster than '../hg2/hg status -R $REPO --config=experimental.dirstate-tree.in-memory=1'

* ./hg is this revision
* ../hg2/hg is its parent
* $REPO is an old snapshot of mozilla-central

Differential Revision: https://phab.mercurial-scm.org/D10550
Simon Sapin
dirstate-tree: Add #[timed] attribute to `status` and `DirstateMap::read`

When running with a `RUST_LOG=trace` environment variable, the `micro_timer`
crate prints the duration taken by each call to functions with that attribute.

Differential Revision: https://phab.mercurial-scm.org/D10552
Simon Sapin
dirstate-tree: Make `DirstateMap` borrow from a bytes buffer

… that has the contents of the `.hg/dirstate` file.
This only applies to the tree-based flavor of `DirstateMap`.

For now only the entire `&[u8]` slice is stored, so this is not useful yet.

Adding a lifetime parameter to the `DirstateMap` struct (in hg-core) makes
Python bindings non-trivial because we keep that struct in a Python object
that has a dynamic lifetime tied to Python’s reference-counting and GC.
As long as we keep the `PyBytes` that owns the borrowed bytes buffer next to
the borrowing struct, the buffer will live long enough for the borrows to stay
valid. However this relationship cannot be expressed in safe Rust code in a
way that would statisfy they borrow-checker. We use `unsafe` code to erase
that lifetime parameter, and encapsulate it in a safe abstraction similar to
the owning-ref crate: https://docs.rs/owning_ref/

Differential Revision: https://phab.mercurial-scm.org/D10557
Simon Sapin
dirstate-tree: Paralellize the status algorithm with Rayon

The `rayon` crate exposes "parallel iterators" that work like normal iterators
but dispatch work on different items to an implicit global thread pool.

Differential Revision: https://phab.mercurial-scm.org/D10551
Simon Sapin
dirstate-tree: Borrow paths from the "on disk" bytes

Use std::borrow::Cow to avoid some memory allocations and copying.

Differential Revision: https://phab.mercurial-scm.org/D10560
Simon Sapin
dirstate-tree: Use HashMap instead of BTreeMap

BTreeMap has the advantage of its "natural" iteration order being the one we need
in the status algorithm. With HashMap however, iteration order is undefined so
we need to allocate a Vec and sort it explicitly.

Unfortunately many BTreeMap operations are slower than in HashMap, and skipping
that extra allocation and sort is not enough to compensate.

Switching to HashMap + sort makes `hg status` 17% faster in one test case,
as measure with hyperfine:

Benchmark #1: ../hg2/hg status -R $REPO --config=experimental.dirstate-tree.in-memory=1
  Time (mean ± σ):    765.0 ms ±  8.8 ms    [User: 1.352 s, System: 0.747 s]
  Range (min … max):  751.8 ms … 778.7 ms    10 runs

Benchmark #2: ./hg status -R $REPO --config=experimental.dirstate-tree.in-memory=1
  Time (mean ± σ):    651.8 ms ±  9.9 ms    [User: 1.251 s, System: 0.799 s]
  Range (min … max):  642.2 ms … 671.8 ms    10 runs

  './hg status -R $REPO --config=experimental.dirstate-tree.in-memory=1' ran
    1.17 ± 0.02 times faster than '../hg2/hg status -R $REPO --config=experimental.dirstate-tree.in-memory=1'

* ./hg is this revision
* ../hg2/hg is its parent
* $REPO is an old snapshot of mozilla-central

Differential Revision: https://phab.mercurial-scm.org/D10553
Simon Sapin
dirstate-tree: Fold "tracked descendants" counter update in main walk

For the purpose of implementing `has_tracked_dir` (which means "has tracked
descendants) without an expensive sub-tree traversal, we maintaing a counter
of tracked descendants on each "directory" node of the tree-shaped dirstate.

Before this changeset, mutating or inserting a node at a given path would

* Walking the tree from root through ancestors to find the node or the spot
  where to insert it
* Looking at the previous node if any to decide what counter update is needed
* Performing any node mutation
* Walking the tree *again* to update counters in ancestor nodes

When profiling `hg status` on a large repo, this second walk takes times
while loading a the dirstate from disk.

It turns out we have enough information to decide before he first tree walk
what counter update is needed. This changeset merges the two walks, gaining
~10% of the total time for `hg update` (in the same hyperfine benchmark as
the previous changeset).


Profiling was done by compiling with this `.cargo/config`:

    debug = true

then running with:

    py-spy record -r 500 -n -o /tmp/hg.json --format speedscope -- \
    ./hg status -R $REPO --config experimental.dirstate-tree.in-memory=1

then visualizing the recorded JSON file in https://www.speedscope.app/

Differential Revision: https://phab.mercurial-scm.org/D10554
Simon Sapin
rust: Use `&HgPath` instead of `&HgPathBuf` in may APIs

Getting the former (through `Deref`) is almost the only useful thing one can
do with the latter anyway. With this changes, API become more flexible for the
"provider" of these paths which may store something else that Deref’s to HgPath,
such as `std::borrow::Cow<HgPath>`. Using `Cow` can help reduce memory alloactions
and copying.

Differential Revision: https://phab.mercurial-scm.org/D10558
Simon Sapin
rust: Remove handling of `parents` in `DirstateMap`

The Python wrapper class `dirstatemap` can take care of it.

This removes the need to have both `_rustmap` and `_inner_rustmap`.

Differential Revision: https://phab.mercurial-scm.org/D10555
Simon Sapin
dirstate-tree: Handle I/O errors in status

Errors such as insufficient permissions when listing a directory are logged,
and the algorithm continues without considering that directory.

Differential Revision: https://phab.mercurial-scm.org/D10549
Simon Sapin
dirstate-tree: Add copy_map_insert and copy_map_remove

Differential Revision: https://phab.mercurial-scm.org/D10488
Simon Sapin
dirstate-tree: Borrow copy source paths from the "on disk" bytes

Use std::borrow::Cow to avoid some memory allocations and copying.

These particular allocations are not visible when profiling (as many files
in a typical repo don’t have a copy source). This change is "warm up"
for doing the same with paths of files themselves, which is more involved
since those paths are used as `HashMap` keys. This gets of the way the
addition of a lifetime parameter to several types.

Differential Revision: https://phab.mercurial-scm.org/D10559
Simon Sapin
dirstate-tree: Ignore FIFOs etc. in the status algorithm

If a filesystem directory contains anything that is not:

* a "normal" file
* a symbolic link
* or a directory

… act as if that directory entry was not there. For example, if that path was
previously a tracked file, mark it as deleted or removed.

Differential Revision: https://phab.mercurial-scm.org/D10548
Simon Sapin
rust: Read dirstate from disk in DirstateMap constructor

Before this changeset, Python code first creates an empty `DirstateMap` Rust
object, then immediately calls its `read` method with a byte string of the
contents of the `.hg/dirstate` file.

This makes that byte string available to the constructor of `DirstateMap`
in the hg-cpython crate. This is a first step towards enabling parts of
`DirstateMap` in the hg-core crate to borrow from this buffer without copying.

Differential Revision: https://phab.mercurial-scm.org/D10556
Simon Sapin
dirstate-tree: Add the new `status()` algorithm

With the dirstate organized in a tree that mirrors the structure of the
filesystem tree, we can traverse both trees at the same time in order to
compare them. This is hopefully more efficient that building multiple
big hashmaps for all of the repository’s contents.

Differential Revision: https://phab.mercurial-scm.org/D10547
Simon Sapin
rust: Remove DirstateMap::file_fold_map

This was a HashMap constructed on demand and then cached in the DirstateMap
struct to avoid reconstructing at the next access. However the only use is
in Python bindings converting it to a PyDict. That method in turn is wrapped
in a @cachedproperty in Python code.

This was two redudant layers of caching. This changeset removes the Rust-level
one to keep the Python dict cache, and have bindings create a PyDict by

Differential Revision: https://phab.mercurial-scm.org/D10493
Simon Sapin
dirstate-tree: Add clear_ambiguous_times in the new DirstateMap

Also drive-by refactor it in the other DirstateMap

Differential Revision: https://phab.mercurial-scm.org/D10489
Simon Sapin
rust: Move "lookup" a.k.a. "unsure" paths into `DirstateStatus` struct

Instead of having `status()` returning a tuple of those paths and

Differential Revision: https://phab.mercurial-scm.org/D10494
Simon Sapin
dirstate-tree: Maintain a counter of DirstateEntry’s and copy sources

This allows implementing __len__ for DirstateMap and CopyMap efficiently,
without traversing the tree.

Differential Revision: https://phab.mercurial-scm.org/D10487
Simon Sapin
dirstate-tree: Add has_dir and has_tracked_dir

A node without a `DirstateMap` entry represents a directory.

Only some values of `EntryState` represent tracked files.
A directory is considered "tracked" if it contains any descendant file that
is tracked. To avoid a sub-tree traversal in `has_tracked_dir` we add a
counter for this. A boolean flag would become insufficent when we implement
remove_file and drop_file.

`add_file_node` is more general than needed here, in anticipation of adding
the `add_file` and `remove_file` methods.

Differential Revision: https://phab.mercurial-scm.org/D10490
Simon Sapin
dirstate-tree: Give to `status()` mutable access to the `DirstateMap`

Differential Revision: https://phab.mercurial-scm.org/D10546
Simon Sapin
rust: Add doc-comments to DirstateStatus fields

Differential Revision: https://phab.mercurial-scm.org/D10495
Simon Sapin
dirstate-tree: Add "non normal" and "from other parent" sets

Unlike the other DirstateMap implementation, these sets are not materialized
separately in memory. Instead we traverse the main tree.

Differential Revision: https://phab.mercurial-scm.org/D10492
Simon Sapin
dirstate-tree: Add add_file, remove_file, and drop_file

Again, various counters need to be kept up to date.

Differential Revision: https://phab.mercurial-scm.org/D10491
Simon Sapin
dirstate-tree: Empty shell for a second Rust DirstateMap implementation

For background see description of the previous changeset
"Make Rust DirstateMap bindings go through a trait object".

Add an empty shell for a opt-in second Rust implementation of the
`DirstateMap` type and the `status` function. For now all methods panic.
This can be seen in "action" with:

    ./hg status --config experimental.dirstate-tree.in-memory=1

Differential Revision: https://phab.mercurial-scm.org/D10364
Simon Sapin
dirstate-tree: Add tree traversal/iteration

Like Python’s, Rust’s iterators are "external" in that they are driven
by a caller who calls a `next` method. This is as opposed to "internal"
iterators who drive themselves and call a callback for each item.

Writing an internal iterator traversing a tree is easy with recursion,
but internal iterators cannot rely on the call stack in that way,
they must save in an explicit object all state that they need to be
preserved across two `next` calls.

This algorithm uses a `Vec` as a stack that contains what would be
local variables on the call stack if we could use recursion.

Differential Revision: https://phab.mercurial-scm.org/D10370
Simon Sapin
dirstate-tree: Add parsing only dirstate parents from disk

Differential Revision: https://phab.mercurial-scm.org/D10368
Simon Sapin
rust: Add a Timestamp struct instead of abusing Duration

`SystemTime` would be the standard library type semantically appropriate
instead of `Duration`.

But since the value is coming from Python as a plain integer and used in
dirstate packing code as an integer, let’s make a type that contains a single
integer instead of using one with sub-second precision.

Differential Revision: https://phab.mercurial-scm.org/D10485
Simon Sapin
dirstate-tree: Abstract "non-normal" and "other parent" sets

Instead of exposing `HashSet`s directly, have slightly higher-level
methods for the operations that Python bindings need on them.

Differential Revision: https://phab.mercurial-scm.org/D10363
Simon Sapin
dirstate-tree: Serialize to disk

The existing `pack_dirstate` function relies on implementation details
of `DirstateMap`, so extract some parts of it as separate functions
for us in the tree-based `DirstateMap`.

The `bytes-cast` crate is updated to a version that has an `as_bytes` method,
not just `from_bytes`:

Drive-by refactor `clear_ambiguous_times` which does part of the same thing.

Differential Revision: https://phab.mercurial-scm.org/D10486
Simon Sapin
dirstate-tree: Add `WithBasename` wrapper for `HgPath`

In the tree-shaped dirstate we want to have nodes representing files or
directories, where directory nodes contain a map associating "base" names
to child nodes for child files and directories.

Many dirstate operations expect a full path from the repository root, but
re-concatenating string from nested map keys all the time might be expensive.
Instead, `WithBasename` stores a full path for these operations but
behaves as its base name (last path component) for equality and comparison.

Additionally `inclusive_ancestors` provides the successive map keys
that are needed when inserting a new dirstate node at a given full path.

Differential Revision: https://phab.mercurial-scm.org/D10365
Simon Sapin
dirstate-tree: Add map `get` and `contains_key` methods

Differential Revision: https://phab.mercurial-scm.org/D10369
Simon Sapin
dirstate-tree: Implement DirstateMap::read

This reads the on-disk dirstate in its current format (a flat sequence of
entries) and creates a tree in memory.

Differential Revision: https://phab.mercurial-scm.org/D10367
Simon Sapin
dirstate-tree: Make Rust DirstateMap bindings go through a trait object

This changeset starts a series that adds an experiment to make status faster
by changing the dirstate (first only in memory and later also on disk) to
be shaped as a tree matching the directory structure, instead of the current
flat collection of entries. The status algorithm can then traverse this tree
dirstate at the same time as it traverses the filesystem.

We (Octobus) have made prototypes that show promising results but are prone
to bitrot. We would like to start upstreaming some experimental Rust code that
goes in this direction, but to avoid disrupting users it should only be
enabled by some run-time opt-in while keeping the existing dirstate structure
and status algorithm as-is.

The `DirstateMap` type and `status` function look like the appropriate
boundary. This adds a new trait that abstracts everything Python bindings need
and makes those bindings go through a `dyn` trait object. Later we’ll have two
implementations of this trait, and the same bindings can use either.

Differential Revision: https://phab.mercurial-scm.org/D10362
Matt Harbison
tests: stabilize test-persistent-nodemap.t on Windows

Several issues here:

  - Hooks can't invoke shell scripts on Windows, so use `sh` to launch
  - `dd` in MSYS only recognizes `status=noxfer`
  - The `PATH` updating triggered a massive slowdown, but is no longer needed

I have no idea why, but removing the `PATH` update substantially increased the
speed of the test.  It was running finishing at ~4:30 with `--debug` and ~14:50
without it, but now completes in ~2:20 on my Windows laptop.

Differential Revision: https://phab.mercurial-scm.org/D10636
Matt Harbison
tests: change the fixer commands to use the buffer attribute on stdio objects

Otherwise `\r` was getting injected into the fixed lines and throwing off the
commit hashes on Windows when the fixer is invoked with py3.

Differential Revision: https://phab.mercurial-scm.org/D10637
Matt Harbison
tests: invoke some shell scripts through the shell interpreter for Windows

Otherwise, Windows was prompting what program to use to open the file (or just
opening it if there was a file association configured).

Differential Revision: https://phab.mercurial-scm.org/D10635
Matt Harbison
tests: run python script through quoted interpreter instead of directly

This helps Windows when python is installed to %PROGRAMFILES%.

Differential Revision: https://phab.mercurial-scm.org/D10634
Matt Harbison
tests: ensure `$PYTHON` is quoted for Windows

Global installs of python3 go into "Program Files", and tons of tests fail with
mysterious errors if this isn't quoted.  Most of this is a followup to
0826d684a1b5, but a some of these were existing issues.  Shebang lines are
ignored because quoting breaks direct execution- these will need to be launched
indirectly with the quoted `$PYTHON` command.

Differential Revision: https://phab.mercurial-scm.org/D10633
Kévin Lévesque
remotefilelog: use the correct capability when using getfilestype threaded

The functon was overlooked when the capability was renamed

Differential Revision: https://phab.mercurial-scm.org/D10673