Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: bob layers #561

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft

RFC: bob layers #561

wants to merge 2 commits into from

Conversation

rhubert
Copy link
Contributor

@rhubert rhubert commented Apr 2, 2024

This is a very rough first draft of support for layers handled by bob to get some early comments about this.

The idea is to add a scm-spec to the layers section of the config.yaml:

layers:
  - foo:
     scm: git
     url: git@foo:/foo.git
     branch: master
     commit: ....

A layer specified like this also changes the behavior when using nested layers where multiple layers depend on the same layer as the layer structure is flattened and each layer is checked out only once. This make it possible to have the same recipes either used as root-recipes repo or as layer from another recipes package. (ATM this is done by matching the name of the layer only, maybe the url should be used instead or a additional uuid,... ??)

ATM a internal .bob_layers root package is generated for this with dependencies to a package for each layer. I don't know if this is a good idea but it enables reusing most of the code. ;)

If this is a viable approach it would need some filtering for the other commands like bob ls to avoid the .bob_layers package is shown there as well. Maybe this filtering can be done by conditionally adding the .bob_layers to the virtualRoot and providing the define by the layers-command? But this would result in longer parsing times when switching from bob layers to any other bob command.

@jkloetzke
Copy link
Member

Taking a step back: what is the actual motivation of having Bob managing the layers? What is the benefit of adding this feature compared to git submodules? I think we should have a compelling benefit to add the complexity. Could you shed some more light on this?

If we go this route, the following things come to my mind:

  • Collapsing the layers hierarchy is probably OK. I don't think that anybody has ever used nested layers.
  • Internally treating the layers with the same logic will most probably not fly.
    • Nested layers require some loop to discover and parse the next level. The current logic can handle only exactly one level.
    • Error messages and other output will probably be misleading.
  • I could imagine that one may want to optionally automatically update layers unless building with --build-only.

@rhubert
Copy link
Contributor Author

rhubert commented Apr 7, 2024

Taking a step back: what is the actual motivation of having Bob managing the layers? What is the benefit of adding this feature compared to git submodules? I think we should have a compelling benefit to add the complexity. Could you shed some more light on this?

There are two reasons:

My current layer structure would look like this if it would be supported:

- recipes a
   - layer b
     - layer c
   - layer c
   - layer d 
     - layer b
       - layer c
     - layer c

layer d also contains recipes for a stand alone product and it would be nice to be able to build then separate to recipes a. As of today this is not possible without changing the config.yaml of d and to add c and d to the layers.
The remaining dependencies are just invisible and one would need to know them when reusing one of the other layers.

IMO this can't be solved with the current layers except by splitting the recipes much more. Or do you have any suggestions how this could be solved?

The other reason is "because some developers don't like submodules." We have (at least) one project where the layer content has been copied into the recipes just to avoid the need of using submodules. (I'm not sure if using bob managed layers would help them...)

If we go this route, the following things come to my mind:

* Collapsing the layers hierarchy is probably OK. I don't think that anybody has ever used nested layers.

I guess we can also add a policy for this?

* Internally treating the layers with the same logic will most probably not fly.
  
  * Nested layers require some loop to discover and parse the next level. The current logic can handle only exactly one level.

Yes, the loop is missing.

  * Error messages and other output will probably be misleading.

* I could imagine that one may want to _optionally_ automatically update layers unless building with `--build-only`.

I also thought about this as it would avoid the need of adding a new command. Not sure what would be more convenient.

@jkloetzke
Copy link
Member

Taking a step back: what is the actual motivation of having Bob managing the layers? What is the benefit of adding this feature compared to git submodules? I think we should have a compelling benefit to add the complexity. Could you shed some more light on this?

There are two reasons:
...
layer d also contains recipes for a stand alone product and it would be nice to be able to build then separate to recipes a. As of today this is not possible without changing the config.yaml of d and to add c and d to the layers. The remaining dependencies are just invisible and one would need to know them when reusing one of the other layers.

IMO this can't be solved with the current layers except by splitting the recipes much more. Or do you have any suggestions how this could be solved?

Right. This makes sense and I also cannot see how it could be solved with git submodules either. So let's do it.

The other reason is "because some developers don't like submodules." We have (at least) one project where the layer content has been copied into the recipes just to avoid the need of using submodules. (I'm not sure if using bob managed layers would help them...)

I'm not sure either. But having an "automatic" layers update that behaves like the SCMs in regular recipes might be more attractive.

If we go this route, the following things come to my mind:

* Collapsing the layers hierarchy is probably OK. I don't think that anybody has ever used nested layers.

I guess we can also add a policy for this?

No, I wouldn't unless somebody speaks up. I have never heard of anybody using nested layers. We can still add such a policy if somebody comes around.

  • I could imagine that one may want to optionally automatically update layers unless building with --build-only.

I also thought about this as it would avoid the need of adding a new command. Not sure what would be more convenient.

I think we need both. The layers command makes perfectly sense. I was thinking about the case where somebody clones a project and expects it to "just work". So

git clone ...
cd project
bob dev ...

should just work, even when layers are used. On the other hand, you might not always want the automatic layer pulling. So some overridable option to bob build/dev makes sense.

@sbixl
Copy link

sbixl commented Apr 8, 2024

We have similar use cases as reported by @rhubert. I often get the question why the layers can't be managed better via bob (in analogy to the SCMs in the recipes).

The current situation is that when a layer is updated, the whole team must be notified about this to update their local gitsubmodule(s). Sometimes it is forgotten and the colleague work on an older version. It would be nice if this update could be done via bob dev/build in the background.

However, if the main recipes have a breaking dependency on one or more layers, these would also have to be updated. In this use-case, updating the layer alone would lead to problems and you would have to explicitly make a pull on the recipes (if you know it, which is mostly not the case).

The use of nested layers also makes sense (and we are now at the point where we absolutely need it). I have already experimented with nested layers but encountered similar problems to those reported by @rhubert.

As mentioned:

  • Having an "automatic" layers update that behaves like the SCMs in regular recipes might be more attractive, would be a great improvement.
  • Executing git clone ... without the knowledge of a gitsubmodule (--recurse-submodules) behind would improve the use-ability much more.
  • Updating the main recipes in addition to the layers would be nice too. This could be done via a query that has to be confirmed by the user. This way the user knows that something has changed and an update is being carried out. This query could also be made configurable and set to auto-update so that the system does not wait for input e.g. in case of a CI build.

@rhubert
Copy link
Contributor Author

rhubert commented Apr 8, 2024

The current situation is that when a layer is updated, the whole team must be notified about this to update their local gitsubmodule(s). Sometimes it is forgotten and the colleague work on an older version. It would be nice if this update could be done via bob dev/build in the background.

However, if the main recipes have a breaking dependency on one or more layers, these would also have to be updated. In this use-case, updating the layer alone would lead to problems and you would have to explicitly make a pull on the recipes (if you know it, which is mostly not the case).

I think the last point would become valid if floatinglayers are used, which - with submodules - is not possible(?). And even if layers managed by bob this is probably not a good idea. Otherwise the layers are bound to a version of the recipes and can't become incompatible to the recipes? Or do I miss something?

@sbixl
Copy link

sbixl commented Apr 9, 2024

I had the following scenario in mind (assuming the layers are no longer managed via gitsubmodules):

layers:
  - foo:
     scm: git
     url: git@foo:/foo.git
     branch: master # <-- layer is floating

Changes are made to the layer that also affect the main recipes (e.g. incompatible changes). Developer 1 adjusts the main recipes accordingly and push the changes to the remote. Developer 2 or a whole team knows nothing about this and would automatically update the layer with the next bob dev/build because it is floating. The system then stops work building because nobody knows that the recipes also need to be updated. This quickly leads to frustration, especially among colleagues who are not so familiar with recipes and their interrelationships. So if floating layers are an option there should be an anditional check if the main recipes must be updated too.

But if I think more about it for a while, it makes no sense to work with floating layers. If several layers are nested inside each other, this increases the level of complexity considerably, becomes confusing and is error prone. I think it would be better to always hard link the layers to a tag/commit like the gitsubmodules already do it.

So I agree that this was not the best idea. ;-)

Nevertheless, it would be a nice feature when the main recipes are on a branch and there are new changes on it you can you can get a notification from bob about this. In our team, at least, this is a recurring use case where you always have to make sure that everyone is really synchronized. I think this can be solved more elegant (it should be relatively easy for Bob to provide something like this?)

@rhubert
Copy link
Contributor Author

rhubert commented Apr 10, 2024

Nevertheless, it would be a nice feature when the main recipes are on a branch and there are new changes on it you can you can get a notification from bob about this. In our team, at least, this is a recurring use case where you always have to make sure that everyone is really synchronized. I think this can be solved more elegant (it should be relatively easy for Bob to provide something like this?)

I think this can't be easily done as bob has no knowledge about where the recipes are coming from. Could be git, svn, cvs, ... so it might be hard to fetch them.

As of today you can add a preBuildHook to check if your recipes are up-to-date, otherwise you can output a warning and exit with a non zero exit code.

https://bob-build-tool.readthedocs.io/en/latest/manual/configuration.html#hooks

Note: you can not simply git pull your recipes in the hook and exit with 0 as the hook is executed after the recipes parsing has been done and the build would still use the old recipes.

@rhubert
Copy link
Contributor Author

rhubert commented Apr 22, 2024

I pushed some updates to the code to avoid the internal package.. It's still not ready for review...

Anyway - after some internal discussions we think that merging the (sub-) layers of different layers by their name/url/whatever might not work very well. Instead we want to filter the layers provided by a sublayer when depending on it:

Given a layer foo with

# foo - config.yaml
layers:
 - bar:
   [...]
 - baz:
   [...]

The user of this layer can avoid the use of bar from foo by adding only baz to foo's useSubLayers , e.g. because foo also has a direct dependency on bar

layers:
 - foo:
    checkoutSCM:
      scm: git
      url: ....
      commit: ...
    useSubLayers: [baz]
 - bar:
   [...]

Not adding useSubLayers will simply use all.

@jkloetzke
Copy link
Member

Anyway - after some internal discussions we think that merging the (sub-) layers of different layers by their name/url/whatever might not work very well.

I agree that merging them on the basis of the URL is certainly not viable. But why not by name? Projects/layers that include other layers should know the naming below. Only if two layers refer to some common third layer by a different name things go south. But is this really likely to happen?

One could imagine to encourage people to use reverse polish notation for the layers (e.g. dev.bobbuildtool.basement). That would make the names more predictable.

Instead we want to filter the layers provided by a sublayer when depending on it:

I'm not sure that this is better. The behaviour can get very subtle I guess. Also, how it it supposed to work transitively (control the sub-sub-layers)?

I don't have any better idea at the moment but if feels its more like hack...

@rhubert
Copy link
Contributor Author

rhubert commented Apr 25, 2024

The idea was to have more control about the layers. e.g. if a common sublayer is used by 2 different layers one might need to use the newer layer spec. And if there are two common sublayers by 2 different layers they might cross reference newer versions of their sublayers:

- recipes
 - layer foo
   - layer b (1.0)
   - layer c (1.1)
  - layer bar
   - layer b (1.1)
   - layer c (1.0)

To make this work we'd need to select c from foo and b from bar. Or add b (1.1) and c (1.1) to the recipes?

@jkloetzke
Copy link
Member

My current line of thinking goes like this:

We would always flatten the layer dependency tree by name. This happens even for layers that are not SCM-managed but just referenced in config.yaml. So whenever layers have the same name, only the first one of them is used. Which one it is, is defined by the order.

The project has the highest precedence. So if a layer "b" is defined in the project, it would override any deeper layer dependencies (layer foo and bar in your case). If the dependencies to some deeper layer are on the same level (b and c in your case), the order is important. It's already defined that layers in config.yaml are named from highest to lowest precedence. So in your case layer b (1.0) and layer c (1.1) would be used because layer foo has a higher precedence than bar. If you want b (1.1) and c (1.1), they must be added to the project (at least b (1.1)).

While this can still be very subtle what is actually used, it at least is consistent with the current layer precedence. For SCM managed layers, Bob should certainly not even checkout layers that were already satisfied by higher precedence layers. So it will be more or less obvious what is going to be used.

Does that make sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants