Martin von Zweigbergk
copy: add experimetal support for unmarking committed copies

The simplest way I'm aware of to unmark a file as copied after
committing is this:

  hg uncommit --keep <dest>
  hg forget <dest>
  hg add <dest>
  hg amend

This patch teaches `hg copy --forget` a `-r` argument to simplify that into:

  hg copy --forget --at-rev . <dest>

In addition to being simpler, it doesn't touch the working copy, so it
can easily be used even if the destination file has been modified in
the working copy.

I'll teach `hg copy` without `--forget` to work with `--at-rev` next.

Differential Revision: https://phab.mercurial-scm.org/D8030
Martin von Zweigbergk
copy: move argument validation a little earlier

Argument validation is usually done early and I will want it done
before some code that I'm about to add.

Differential Revision: https://phab.mercurial-scm.org/D8033
Martin von Zweigbergk
copy: add option to unmark file as copied

To unmark a file as copied, the user currently has to do this:

  hg forget <dest>
  hg add <dest>

The new command simplifies that to:

  hg copy --forget <dest>

That's not a very big improvement, but I'm planning to also teach `hg
copy [--forget]` a `--at-rev` argument for marking/unmarking copies
after commit (usually with `--at-rev .`).

Differential Revision: https://phab.mercurial-scm.org/D8029
Pierre-Yves David
nodemap: introduce an option to use mmap to read the nodemap mapping

The performance and memory benefit is much greater if we don't have to copy all
the data in memory for each information. So we introduce an option (on by
default) to read the data using mmap.

This changeset is the last one definition the API for index support nodemap
data. (they have to be able to use the mmaping).

Below are some benchmark comparing the best we currently have in 5.3 with the
final step of this series (using the persistent nodemap implementation in
Rust). The benchmark run `hg perfindex` with various revset and the following


* do not use the persistent nodemap
* use the CPython implementation of the index for nodemap
* use mmapping of the changelog index


* use the MixedIndex Rust code, with the NodeTree object for nodemap access
  (still in review)
* use the persistent nodemap data from disk
* access the persistent nodemap data through mmap
* use mmapping of the changelog index

The persistent nodemap greatly speed up  most operation on very large
repositories. Some of the previously very fast lookup end up a bit slower because
the persistent nodemap has to be setup. However the absolute slowdown is very
small and won't matters in the big picture.

Here are some numbers (in seconds) for the reference copy of mozilla-try:

Revset            Before  After    abs-change speedup

-10000:          0.004622 0.005532  0.000910  ×  0.83
-10:              0.000050 0.000132  0.000082  ×  0.37
tip              0.000052 0.000085  0.000033  ×  0.61
0 + (-10000:)    0.028222 0.005337  -0.022885  ×  5.29
0                0.023521 0.000084  -0.023437  × 280.01
(-10000:) + 0    0.235539 0.005308  -0.230231  ×  44.37
(-10:) + :9      0.232883 0.000180  -0.232703  ×1293.79
(-10000:) + (:99) 0.238735 0.005358  -0.233377  ×  44.55
:99 + (-10000:)  0.317942 0.005593  -0.312349  ×  56.84
:9 + (-10:)      0.313372 0.000179  -0.313193  ×1750.68
:9                0.316450 0.000143  -0.316307  ×2212.93

On smaller repositories, the cost of nodemap related operation is not as big, so
the win is much more modest. Yet it helps shaving a handful of millisecond here
and there.

Here are some numbers (in seconds) for the reference copy of mercurial:

Revset            Before  After    abs-change speedup
-10:              0.000065 0.000097  0.000032  × 0.67
tip              0.000063 0.000078  0.000015  × 0.80
0                0.000561 0.000079  -0.000482  × 7.10
-10000:          0.004609 0.003648  -0.000961  × 1.26
0 + (-10000:)    0.005023 0.003715  -0.001307  × 1.35
(-10:) + :9      0.002187 0.000108  -0.002079  ×20.25
(-10000:) + 0    0.006252 0.003716  -0.002536  × 1.68
(-10000:) + (:99) 0.006367 0.003707  -0.002660  × 1.71
:9 + (-10:)      0.003846 0.000110  -0.003736  ×34.96
:9                0.003854 0.000099  -0.003755  ×38.92
:99 + (-10000:)  0.007644 0.003778  -0.003866  × 2.02

Differential Revision: https://phab.mercurial-scm.org/D7894
Martin von Zweigbergk
copy: add experimental support for marking committed copies

The simplest way I'm aware of to mark a file as copied/moved after
committing is this:

  hg uncommit --keep <src> <dest>  # <src> needed for move, but not copy
  hg mv --after <src> <dest>
  hg amend

This patch teaches `hg copy` a `--at-rev` argument to simplify that

  hg copy --after --at-rev . <src> <dest>

In addition to being simpler, it doesn't touch the working copy, so it
can easily be used even if the destination file has been modified in
the working copy.

Differential Revision: https://phab.mercurial-scm.org/D8035
Raphaël Gomès
rust-dirstatemap: directly return `non_normal` and `other_entries`

This cleans up the interface which I previously thought needed to be uglier
than in reality. No performance difference, simple refactoring.

Differential Revision: https://phab.mercurial-scm.org/D8121
Martin von Zweigbergk
debugmergestate: make templated

Our IntelliJ team wants to be able to read the merge state in order to
help the user resolve merge conflicts. They had so far been reading
file contents from p1() and p2() and their merge base. That is not
ideal for several reasons (merge base is not necessarily the "graft
base", renames are not handled, commands like `hg update -m` is not
handled). It will get especially bad as of my D7827. This patch makes
the output s a templated. I haven't bothered to make it complete
(e.g. merge driver states are not handled), but it's probably good
enough as a start.

I've done a web search for "debugmergestate" and I can't find any
indication that any tools currently rely on its output. If it turns
out that we get bug reports for it once this is released, I won't
object to backing this patch out on the stable branch (and then
perhaps replace it by a separate command, or put it behind a new

Differential Revision: https://phab.mercurial-scm.org/D8113
Martin von Zweigbergk
copy: rewrite walkpat() to depend less on dirstate

I want to add a `hg cp/mv -r <rev>` option to mark files as
copied/moved in an existing commit (amending that commit). The code
needs to not depend on the dirstate for that.

Differential Revision: https://phab.mercurial-scm.org/D8031
Martin von Zweigbergk
copy: rename `wctx` to `ctx` since it will not necessarily be working copy

Differential Revision: https://phab.mercurial-scm.org/D8032
Martin von Zweigbergk
merge with stable
Pierre-Yves David
test: pin the number of CPU for issue4074 tests

On machine with an hundreds of CPUs, the "user" CPU time reported can be
inflated by the status steps. Since the test especially focus on the diff
computation, we restrict the number of CPU to avoid potential issues.

Differential Revision: https://phab.mercurial-scm.org/D8112
Raphaël Gomès
rust-dirstatemap: cache non normal and other parent set

Performance of `hg update` was significantly worse since the introduction of
the Rust `dirstatemap`. This regression was noticed by Valentin Gatien-Baron
when working on a large repository, as it goes unnoticed for smaller
repositories like Mercurial itself.

This fix introduces the same getter/setter mechanism at `hg-core` level as
for `set/get_dirs`.

While this technique is, as previously discussed, quite suboptimal, it fixes an
important enough problem. Refactoring `hg-core` to use the typestate
pattern could be a good approach to improving code quality in a future patch.

This is a graft of stable of 83b2b829c94e

Differential Revision: https://phab.mercurial-scm.org/D8110
Raphaël Gomès
rust-dirstatemap: add `NonNormalEntries` class

This fix introduces the same encapsulation as the `copymap`. There is no easy
way of doing this any better for now.

`hg up -r null && time HGRCPATH= HGMODULEPOLICY=rust+c hg up tip` on Mozilla
Central, (not super recent, but it doesn't matter):

Before: 7:44,08 total
After: 1:03,23 total

Pretty brutal regression!

This is a graft on stable of cf1f8660e568

Differential Revision: https://phab.mercurial-scm.org/D8111
Yuya Nishihara
pathutil: resurrect comment about path auditing order

It was removed at 51c86c6167c1, but expensive symlink traversal isn't the
only reason we should walk path components from the root.
  • hg tests: run-tests.py (python 2.7.10) failed -  stdio
Yuya Nishihara
chgserver: spawn new process if schemes change

The schemes extension updates hg.schemes table. It's technically possible
for hg.repository() to look for e.g. ui.schemes instead of depending on
module-local table, but I don't think the change would make much sense
since [schemes] is usually specified in ~/.hgrc and thus it can be considered
static data.
Raphaël Gomès
rust-dirstatemap: remove additional lookup in dirstate.matches

We use the same trick as the Python implementation

Differential Revision: https://phab.mercurial-scm.org/D7119
  • hg tests: run-tests.py (python 2.7.10) failed -  stdio
Georges Racinet
rust-nodemap: insert method

In this implementation, we are in direct competition
with the C version: this Rust version will have a clear
startup advantage because it will read the data from disk,
but the insertion happens all in memory for both.

Differential Revision: https://phab.mercurial-scm.org/D7795
Valentin Gatien-Baron
recover: don't verify by default

The reason is:
- it's not that hard to trigger interrupted transactions: just run out
  of disk space
- it takes forever to verify on large repos. Before --no-verify, I
  told people to C-c hg recover when the progress bar showed up. Now I
  tell them to pass --no-verify.
- I don't remember a single case where the verification step was

This is technically a change of behavior. Perhaps this would be better
suited for tweakdefaults?

Differential Revision: https://phab.mercurial-scm.org/D7972
Raphaël Gomès
rust-matchers: implement `visit_children_set` for `FileMatcher`

As per the removed inline comment, this will become useful in a future patch
in this series as the `IncludeMatcher` is introduced.

Differential Revision: https://phab.mercurial-scm.org/D7914
Augie Fackler
context: use manifest.find() instead of two separate calls

I noticed this while debugging an extension that's implementing the manifest
interface. Always nice to save a function call.

Differential Revision: https://phab.mercurial-scm.org/D8109
Martin von Zweigbergk
  • Win7 x86_64 hg tests (stable): run-tests.py (python 2.7.13) failed -  stdiowarnings (2)
Augie Fackler
manifest: move matches method to be outside the interface

In order to adequately smoke out any legacy consumers of the method, we rename
it to _matches so it's clear that it's class-private. To my amazement, all
consumers of this method really only wanted matching filenames, not a full
filtered manifest.

Differential Revision: https://phab.mercurial-scm.org/D8085
Martin von Zweigbergk
tests: add test of rebase with conflict in merge commit

It doesn't seem like we had any tests of this. I think it's pretty
weird that the two parents we're merging are not the working copy
parents during the conflict resolution.

Differential Revision: https://phab.mercurial-scm.org/D7824
Augie Fackler
tags: use modern // operator for division

Fixes a test on Python 3.

# skip-blame only correcting a division operator, not a substantive change

Differential Revision: https://phab.mercurial-scm.org/D8108
Martin von Zweigbergk
tests: add `hg log -G` output when there are merge conflicts

The next commit will change the behavior for these. I've used slightly
different commands in the different tests to match the surrounding

Differential Revision: https://phab.mercurial-scm.org/D8042
Martin von Zweigbergk
rebase: stop relying on having two parents to resume rebase

I'm about to make it so we don't have two parents when a rebase is
interrupted (unless we're just rebasing on a merge commit). The code
for detecting if we're resuming a rebase relied on having two parents,
so this patch rewrites that to instead set a boolean when we resume.

Note that `self.resume` in the new condition implies `not
self.inmemory` (rebase cannot be resumed in memory), so that's why
that part can be omitted.

Differential Revision: https://phab.mercurial-scm.org/D7826
Augie Fackler
tags: fix some type confusion exposed in python 3

# skip-blame just b-prefix and %-format cleanup, no meaningful change

Differential Revision: https://phab.mercurial-scm.org/D8107
Martin von Zweigbergk
rebase: remove some now-unused parent arguments

Differential Revision: https://phab.mercurial-scm.org/D7829
Martin von Zweigbergk
graphlog: use '%' for other context in merge conflict

This lets the user more easily find the commit that is involved in the
conflict, such as the source of `hg update -m` or the commit being
grafted by `hg graft`.

Differential Revision: https://phab.mercurial-scm.org/D8043
Martin von Zweigbergk
rebase: remove some redundant setting of dirstate parents

Since we're now setting the dirstate parents to its correct values
from the beginning (right after `merge.update()`), we usually don't
need to set them again before committing. The only case we need to
care about is when committing collapsed commits. So we can remove the
`setparents()` calls just before committing and add one only for the
collapse case.

Differential Revision: https://phab.mercurial-scm.org/D7828
Martin von Zweigbergk
rebase: always be graft-like, not merge-like, also for merges

Rebase works by updating to a commit and then grafting changes on
top. However, before this patch, it would actually merge in changes
instead of grafting them in in some cases. That is, it would use the
common ancestor as base instead of using one of the parents. That
seems wrong to me, so I'm changing it so `defineparents()` always
returns a value for `base`.

This fixes the bad behavior in test-rebase-newancestor.t, which was
introduced in 65f215ea3e8e (tests: add test for rebasing merges with
ancestors of the rebase destination, 2014-11-30).

The difference in test-rebase-dest.t is because the files in the tip
revision were A, D, E, F before this patch and A, D, F, G after it. I
think both files should ideally be there.

Differential Revision: https://phab.mercurial-scm.org/D7907
Martin von Zweigbergk
revset: add a revset for parents in merge state

This may be particularly useful soon, when I'm going to change how `hg
rebase` sets its parents during conflict resolution.

Differential Revision: https://phab.mercurial-scm.org/D8041
Martin von Zweigbergk
rebase: don't use rebased node as dirstate p2 (BC)

When rebasing a node, we currently use the rebased node as p2 in the
dirstate until just before we commit it (we then change to the desired
parents). This p2 is visible to the user when the rebase gets
interrupted because of merge conflicts. That can be useful to the user
as a reminder of which commit is currently being rebased, but I
believe it's incorrect for a few reasons:

* I think the dirstate parents should be the ones that will be set
  when the commit is created.

* I think having two parents means that you're merging those two
  commits, but when rebasing, you're generally grafting, not merging.

* When rebasing a merge commit, we should use the two desired parents
  as dirstate parents (and we clearly can't have the rebased node as
  a third dirstate parent).

* `hg graft` (and `hg update --merge`) sets only one parent and `hg
  rebase` should be consistent with that.

I realize that this is a somewhat large user-visible change, but I
think it's worth it because it will simplify things quite a bit.

Differential Revision: https://phab.mercurial-scm.org/D7827
Martin von Zweigbergk
Martin von Zweigbergk
tests: add workaround for bzr bug

This started failing for me today. I guess my bzr was upgraded.

Differential Revision: https://phab.mercurial-scm.org/D8105
Pierre-Yves David
nodemap: track the total and unused amount of data in the rawdata file

We need to keep that information around:

* total data will allow transaction to start appending new information without
  confusing other reader.

* unused data will allow to detect when we should regenerate new rawdata file.

Differential Revision: https://phab.mercurial-scm.org/D7889
Pierre-Yves David
nodemap: add basic checking of the on disk nodemap content

The simplest check it so verify we have all the revision we needs, and nothing

Differential Revision: https://phab.mercurial-scm.org/D7845
Pierre-Yves David
nodemap: double check the source docket when doing incremental update

In theory, the index will have the information we expect it to have. However by
security, it seems safer to double check that the incremental data are generated
from the data currently on disk.

Differential Revision: https://phab.mercurial-scm.org/D7890
Pierre-Yves David
nodemap: keep track of the ondisk id of nodemap blocks

If we are to incrementally update the files, we need to keep some details about
the data we read.

Differential Revision: https://phab.mercurial-scm.org/D7883
Pierre-Yves David
nodemap: code to parse the persistent binary nodemap data

We now have code to read back what we persisted. This will be put to use in
later changesets.

Differential Revision: https://phab.mercurial-scm.org/D7844