Edgewall Software

Ticket #7723 (assigned task)

Opened 3 months ago

Last modified 7 weeks ago

Make repository cache work on multirepos branch

Reported by: rblank Owned by: rblank
Priority: normal Milestone: 0.12
Component: version control Version:
Severity: normal Keywords: multirepos
Cc: cboos, gwk.rko@…

Description

This ticket tracks the progress in making the repository cache work in the multirepos branch (#2086), as discussed in MultipleRepositorySupport/Cache.

Attachments

7723-multirepos-cache-r7590.patch (15.9 KB) - added by rblank 3 months ago.
First try at a simple version of the cache
7723-multirepos-cache-2-r7590.patch (31.1 KB) - added by rblank 3 months ago.
Patch update
7723-multirepos-cache-3-r7684.patch (34.5 KB) - added by rblank 8 weeks ago.
Updated patch with better resync command
7723-multirepos-cache-4-r7684.patch (36.2 KB) - added by rblank 7 weeks ago.
Update improving handling of repository aliases

Change History

Changed 3 months ago by rblank

First try at a simple version of the cache

  Changed 3 months ago by rblank

The patch above implements the simplest version of the multi-repository cache:

  • Added a repository table with a generic structure similar to system for holding repository_dir and youngest_rev for every repository.
  • Added an id column to revision and node_change to contain the repository key.
  • Updated CachedRepository and trac-admin resync accordingly.
  • Added a DB upgrade script.

This is probably the simplest solution to get the cache working again, with the minor drawback that data duplication is possible with scoped repositories.

The cache is currently still synced on every request. But I'd like to get the cache structure right before I tackle the next problem.

follow-up: ↓ 3   Changed 3 months ago by cboos

I just read briefly the patch, it looks good, I'll test it at some later point.

There's a possible problem with your last, unrelated, change in that patch:

- ngettext('Changeset', 'Changesets', single and 1 or 2)
+ ngettext('Changeset ', 'Changesets ', single and 1 or 2)

I'm not entirely sure, but I think that at some point during the extraction the trailing spaces are stripped, that's why I did it the other way round. Would be worth checking.

in reply to: ↑ 2 ; follow-up: ↓ 4   Changed 3 months ago by rblank

  • status changed from new to assigned

Replying to cboos:

I'm not entirely sure, but I think that at some point during the extraction the trailing spaces are stripped, that's why I did it the other way round. Would be worth checking.

At least 0.11-stable and trunk have the space in the 'Changeset ' string. If you put the space into the <em>, the dotted underline below "pseudo-links" in the timeline continues below the space, which looks strange. That's why I made the change. Looking at the generated HTML, the space still seems to be there.

in reply to: ↑ 3   Changed 3 months ago by cboos

Replying to rblank:

If you put the space into the <em>, the dotted underline below "pseudo-links" in the timeline continues below the space, which looks strange.

Ok

... Looking at the generated HTML, the space still seems to be there.

Well, I was referring to their extraction using Babel (python setup.py extract_messages), but now I verified and at least with 0.9.4, the leading and trailing spaces are correctly preserved (I was probably wrong about that stripping). So I'm fine with this change as well.

  Changed 3 months ago by gwk.rko@…

  • cc gwk.rko@… added

  Changed 3 months ago by anonymous

wszytsko dziła poprawnie

Changed 3 months ago by rblank

Patch update

  Changed 3 months ago by rblank

Here's an updated patch with the following changes w.r.t. the previous one:

  • Added autocompletion for trac-admin resync.
  • Use (default) as an ID for the default repository.
  • Added IRepositoryChangeListener and the corresponding change notification chain. Notification of new changesets are triggered by trac-admin repository changeset.
  • Fixed failing test cases. This involved changing the initial creation of the youngest_rev key in the repository table.

I am not sure about a few points:

  • Previously, the youngest_rev value in the system table was created during DB creation. This cannot be done anymore for the repository table, so I moved it into CachedRepository.sync(), in the same transaction where repository_dir is created. But I am not sure if this could introduce race conditions.
  • Changeset notification tries to be smart about scoped repositories, so that only a single call to trac-admin repository changeset is needed. In particular, when notifying new changesets for a scoped repository, it will also notify all repositories having the same "base" repository. For this, I had to introduce Repository.get_base(). I am not sure that this is the right way to go about this.
  • It is also possible to specify the base repository name (e.g. svn:{uuid}:/path/to/repos) as the repository in trac-admin repository changeset, but this exposes the (internal) repository name.
  • The call to RepositoryManager.notify_changesets_added() from console.py goes through the environment, in the same way as get_repository(). Maybe I should use RepositoryManager(env) directly.

  Changed 3 months ago by rblank

  • milestone set to 0.12

follow-up: ↓ 10   Changed 3 months ago by cboos

As a reminder: we should do nothing when calling resync and there's no repository available (#7524).

On a related note, what do you think of having:

trac-admin <env> repository resync <repos> [rev]

Resync the specified repository.
If <repos> is not specified, do nothing.
If <repos> is '*', resync all repositories.

I think that as resyncing all repositories could be potentially very expensive, so better do that when specifically told to do so. Also, moving the resync as a sub-command of repository seems nicer.

in reply to: ↑ 9   Changed 3 months ago by rblank

Replying to cboos:

As a reminder: we should do nothing when calling resync and there's no repository available (#7524).

I'll look into that.

I think that as resyncing all repositories could be potentially very expensive, so better do that when specifically told to do so. Also, moving the resync as a sub-command of repository seems nicer.

Yes, that's a good idea.

On a related note as well, I was thinking of starting to work on your suggestion on trac-dev for the rest of the repository command set:

  • repository add <type> <path>: add a repository
  • repository delete <repos>: remove a repository
  • repository rename <old> <new>: rename a repository
  • repository alias <repos> <alias>: create an alias for a repository

and managing repositories in the repository table. I would also like to create an admin module allowing to perform the same operations. Should I merge this into the patch tracked here, or do you prefer a separate patch?

follow-up: ↓ 12   Changed 3 months ago by cboos

I think it will be better addressed separately.

(Ideally we'd need to have an IAdminConsoleCommandProvider extension, so that we could move all this logic into a trac/versioncontrol/admin.py module, where the command line and the web admin modules could reuse the same functionality).

A few notes on the changeset notification in the patch (in trac/versioncontrol/api.py):

        289             def notify_repository(repos): 
 	290	            repos.sync() 
 	291	            for rev in revs: 

Here we should probably take care of not notifying twice about a given change. Even if this should normally not happen if trac-admin repository changeset is called from a hook, I think it would be more robust to have repos.sync() return the list of changesets that were effectively synced and only notify for those changesets, as that method takes great care of race conditions (even if I'm not 100% confident that there's no race condition left...).

in reply to: ↑ 11 ; follow-up: ↓ 15   Changed 3 months ago by rblank

Replying to cboos:

I think it will be better addressed separately.

Ok, I'll open a new ticket about that. I'll keep a new patch there that should be applied on top of this one. Thank God for Mercurial and its easy branching / merging :-)

(Ideally we'd need to have an IAdminConsoleCommandProvider extension, so that we could move all this logic into a trac/versioncontrol/admin.py module, where the command line and the web admin modules could reuse the same functionality).

Hey, that's an excellent idea! I'll also open a new ticket to keep track of that.

Here we should probably take care of not notifying twice about a given change. Even if this should normally not happen if trac-admin repository changeset is called from a hook (...)

I'm not sure about that. New changeset notification and caching are actually completely orthogonal functionalities, and you certainly also want notification for non-cached repositories. So you probably want to keep the "last notified" information separate from the cache, probably in the repository table as well.

But I'm not sure it's worth the trouble avoiding duplicate notification. As you say, calling the notification from the post-commit hook should already ensure no duplication occurs.

Changed 8 weeks ago by rblank

Updated patch with better resync command

  Changed 8 weeks ago by rblank

The patch above is updated to the current state of the multirepos branch.

  • The resync command has been changed to work as suggested in comment:9.
  • About #7524, the recent refactoring of trac-admin changed the way commands are provided. In this case, disabling the default repository prevents the VersionControlAdmin component from loading, and therefore the repository commands are not provided.

This completes the basic version described in MultipleRepositorySupport/Cache. I'm waiting for feedback and instructions about the next steps on this issue. In the meantime, I'll start on the repository management functionality in #7822.

Changed 7 weeks ago by rblank

Update improving handling of repository aliases

  Changed 7 weeks ago by rblank

The patch above improves handling of repository aliases, which ended up being cached separately before. It slightly modifies RepositoryManager.get_repository() so that the same object is returned for a real repository and an alias. Therefore, repos.reponame is always the name of the real repository.

Christian, do you think you could find some time to review this patch and give me some feedback about what must still be changed (or apply it if it is ok)?

in reply to: ↑ 12   Changed 7 weeks ago by cboos

Replying to rblank:

Replying to cboos:

Here we should probably take care of not notifying twice about a given change. Even if this should normally not happen if trac-admin repository changeset is called from a hook (...)

I'm not sure about that. New changeset notification and caching are actually completely orthogonal functionalities, and you certainly also want notification for non-cached repositories. So you probably want to keep the "last notified" information separate from the cache, probably in the repository table as well.

Another reason why this still bothers me a bit is that I'd like this mechanism to be generic enough so that it can be extended later to other cached vcs. Take the case of Mercurial, the hook can inform you of which changesets have been added, which would be something non completely trivial to recompute on the Trac end. This is why I have the feeling that we should carry the changeset(s) information from the notify to the resync. I haven't started coding anything yet, so it's not anything more than a feeling, I might well be wrong ;-)

Add/Change #7723 (Make repository cache work on multirepos branch)

Author



Change Properties
<Author field>
Action
as assigned
as The resolution will be set. Next status will be 'closed'
to The owner will change from rblank. Next status will be 'new'
 
Note: See TracTickets for help on using tickets.